Introduction to Mazeinda

Introduction

Mazeinda provides functions for detecting and testing the significance of pairwise Monotonic Association for Zero-Inflated non-negative Data with the measure proposed by Pimentel(2009)[1]. This package can handle any degree of zero inflation as well as non-zero ties. The unique requirement is the independence of the entries of the vectors to be correlated. Pairwise association and testing of a large number of vectors can be expedited with parallel computations thanks to the packages “foreach” and “doMC”.

A development of τ^*

In the setting of zero inflated data Pimentel(2009) suggested that τ^* can be re-expressed as where

Recall that two pairs of jointly distributed non-negative continuous random variables (X_i, Y_i) and (X_j, Y_j) are said to be concordant if (X_i − X_j)(Y_i − Y_j) > 0 and discordant if (X_i − X_j)(Y_i − Y_j) < 0. There are 16 possible different realizations of two pairs of the realization of the random variables (x_i, y_i) and (x_j, y_j) displayed in Table below.

To calculate the probability of concordance we apply the law of total probability using the two left most columns of Table , it follows that Since for positive values of the continuous random variables X and Y we have P(X_i < X_j) = 1 − P(X_i > X_j) and P(Y_i < Y_j) = 1 − P(Y_i > Y_j), reduces to

The same logic can be applied for the calculation the probability of discordance using the first and third columns of Table to show that

Finally, Kendall’s tau formula is the difference of and so that

##Simulations The choice of the measure proposed in Pimentel(2009)[1] over the one by Pimentel, Niewiadomska-Bugajb and Wang(2015) [3] is motivated by the proof above and by the following simulations which suggest that the method considered in this package has superior performance in terms of bias and power.

Let:

τ_b be Kendall’s τ_b
τ_t= p₁₁²t₁₁ + 2(p₀₀p₁₁ − p₀₁p₁₀)
τ_p= p₁₁²t₁₁ + 2(p₀₀p₁₁ − p₀₁p₁₀) + 2p₁₁[p₁₀(1 − 2p₁) + p₀₁(1 − 2p₂)]
d= τ_t − τ_p= 2p₁₁[p₁₀(1 − 2p₁) + p₀₁(1 − 2p₂)]

The zero-inflated data is derived from a beta zero-inflated distribution with parameters (0.5, 1) and probability mass at 0 of 0.3, 0.5 and 0.8. The first figure below shows the histograms of τ_b − τ_p and the following the histogram of τ_b − τ_t obtained with 10.000 vectors with 30 entries.

While in the histograms for 0.8 inflation τ_t and τ_p seem to behave similarly and the mode in the histogram for 0.8 of the difference d is close to zero, the histograms for 0.3 and 0.5 inflation suggest that τ_t is a better estimator for Kendall’s τ_b as the mode is centered at 0.

The means of the values plotted in the histograms above confirm these results:

means of the two statistics
	tau_p	tau_t
infl=0.3	0.1328465	0.0021578
infl=0.5	0.0536823	-0.0005492
infl=0.8	-0.0024742	0.0327359

##Comments about τ* and code The formula for the variance of τ* proposed in Pimentel’s thesis [1] cannot be applied when the parameters p_ij attain the values 0 or 1: the proof relies on the delta method which can be applied only when functions are differentiable. At the extrema of the interval [0,1] no function is differentiable, therefore, whenever any of the parameters p_ij attains the values 1 or 0, the standard R cor.test function for Kendall’s correlation is used to obtain p-values, when applicable, as the function does not work in case of vectors with zero variance.

In case of non-zero ties, τ₁₁ is calculated as Kendall’s τ_b. If all the non-zero entries of at least one vector are tied, τ₁₁ os set to 0. This choice is in accordance with the treatment of ties in Kendall (1970)[2]. τ₁₁ is set to 0 also in the case of only one pair of non-zero corresponding entries.

##Examples Mazeinda provides for the user three functions: for obtaining a matrix of association values, $test \textunderscore associations$ for obtaining the p-values of the output of associate and which gives non-zero values only if the association is significantly different from 0. It should be noted that independence of two variates implies a zero value of τ* , but it is not a sufficient and necessary condition and this affects the power of the test, as dependent cases might get rejected. All these functions can be called with two vectors as arguments or with two matrices with vectors to be correlated as columns. If the vectors to be correlated belong to one matrix m, m should be specified as the argument m1 and as the argument m2.

Note that if the parameters p_ij attain the values 0 or 1, since the R function cor.test must be used, the function combine reports the value of Kendall’s τ_b given by the stardart R cor function, if significantly different from 0. The R cor cannot handle vectors with standard deviation 0, therefore such vectors will yield NA. Below are some examples; warnings are suppressed because the cor.test function prints “cannot compute exact p-value with ties”.

library(mazeinda)

set.seed(234)

x=abs(rBEZI(50, mu = 0.9, sigma = 1, nu = 0.7))
y=abs(rBEZI(50, mu = 0.9, sigma = 1, nu = 0.2))
associate(x,y)

## [1] -0.0256

test_associations(x,y)

## [1] 0.6678331

combine(x,y)

## [1] 0

x=matrix(abs(rBEZI(50, mu = 0.9, sigma = 1, nu = 0.6)),5)
y=matrix(abs(rBEZI(30, mu = 0.9, sigma = 1, nu = 0.3)),5)
knitr::kable(associate(x,y), digits=3, caption="associate(x,y)")

associate(x,y)
0.120	-0.756	0.080	0.080	-0.120	-0.120
0.316	-0.500	0.378	-0.378	0.000	-0.632
0.527	0.240	-0.080	-0.080	0.316	-0.527
0.000	0.667	-0.378	-0.378	0.316	0.000
-0.120	-0.080	0.714	-0.571	-0.359	-0.837
0.359	-0.080	-0.571	1.000	0.359	0.598
0.359	0.630	-0.571	0.080	0.598	0.120
0.105	-0.444	-0.080	-0.080	-0.105	-0.105
-0.600	-0.316	0.359	-0.120	-0.800	0.000
0.316	-0.500	0.378	-0.378	0.000	-0.632

knitr::kable(suppressWarnings(test_associations(x,y)),
             digits=3, caption="test_associations(x,y)")

test_associations(x,y)
0.782	0.087	0.355	0.355	0.782	0.782
0.480	0.277	0.429	0.429	1.000	0.157
0.207	0.232	0.645	0.645	0.448	0.207
1.000	0.147	0.429	0.429	0.480	1.000
0.782	0.645	0.119	0.213	0.405	0.052
0.405	0.645	0.213	0.029	0.405	0.166
0.405	0.154	0.213	0.355	0.166	0.782
0.801	0.298	0.645	0.645	0.801	0.801
0.233	0.448	0.405	0.782	0.083	1.000
0.480	0.277	0.429	0.429	1.000	0.157

knitr::kable(suppressWarnings(combine(x,y)),
             digits=3,caption="combine(x,y)")

combine(x,y)
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	1	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0

knitr::kable(suppressWarnings(associate(x,x)),
             digits=3, caption="associate all vectors in the matrix x pairwise")

associate all vectors in the matrix x pairwise
1.000	0.756	-0.080	-0.378	0.080	0.080	-0.571	0.882	0.359	0.756
0.756	1.000	0.667	-0.250	0.756	-0.378	-0.378	0.667	0.000	1.000
-0.080	0.667	1.000	0.333	-0.080	-0.080	0.378	0.240	-0.316	0.667
-0.378	-0.250	0.333	1.000	-0.378	-0.378	0.756	0.000	-0.316	-0.250
0.080	0.756	-0.080	-0.378	1.000	-0.571	-0.571	-0.080	0.120	0.756
0.080	-0.378	-0.080	-0.378	-0.571	1.000	0.080	-0.080	-0.120	-0.378
-0.571	-0.378	0.378	0.756	-0.571	0.080	1.000	-0.080	-0.598	-0.378
0.882	0.667	0.240	0.000	-0.080	-0.080	-0.080	1.000	0.316	0.667
0.359	0.000	-0.316	-0.316	0.120	-0.120	-0.598	0.316	1.000	0.000
0.756	1.000	0.667	-0.250	0.756	-0.378	-0.378	0.667	0.000	1.000

Computations in parallel

If the number of vectors to be associated is extensive, computations can be done in parallel for any of the functions in the package by specifying “parallel=TRUE” and “n_cor=n”, where n is the number for cores desired.

The parameters p_ij’s necessary for calculating τ* are by default calculated as proportions based on the entries of each pair of vectors to be associated. If the parameters are known, they can be provided to any of the functions in the package by specifying the values as arguments. Only p₁₁, p₀₁ and p₁₀ can be specified. Note that if one of these parameters is specified as an argument, the other two must be specified as well. The parameters p_ij’s can be estimated as by the population mean, calculated using all the pairs of column vectors from the matrices specified as the arguments d1 and d2.

knitr::kable(associate(x,y, estimator="own", p11=0.1,p10=0.1, p01=0.1), digits=3,
             caption="parameters specified")

parameters specified
0.130	0.12	0.12	0.12	0.130	0.120
0.120	0.12	0.12	0.12	0.120	0.120
0.117	0.13	0.12	0.12	0.110	0.110
0.120	0.12	0.12	0.12	0.120	0.120
0.120	0.12	0.11	0.12	0.120	0.120
0.130	0.12	0.12	0.13	0.130	0.110
0.110	0.13	0.12	0.12	0.110	0.110
0.123	0.12	0.12	0.12	0.117	0.130
0.117	0.11	0.13	0.12	0.110	0.123
0.120	0.12	0.12	0.12	0.120	0.120

m1=matrix(abs(rBEZI(50, mu = 0.9, sigma = 1, nu = 0.7)),5)
m2=matrix(abs(rBEZI(30, mu = 0.9, sigma = 1, nu = 0.3)),5)
knitr::kable(associate(x,y, estimator="mean", d1=m1,d2=m2), digits=3,
             caption="parameters estimated as population mean")

parameters estimated as population mean
0.044	0.000	0.000	0.000	0.044	0.000
0.000	0.000	0.000	0.000	0.000	0.000
-0.015	0.044	0.000	0.000	-0.044	-0.044
0.000	0.000	0.000	0.000	0.000	0.000
0.000	0.000	-0.044	0.000	0.000	0.000
0.044	0.000	0.000	0.044	0.044	-0.044
-0.044	0.044	0.000	0.000	-0.044	-0.044
0.015	0.000	0.000	0.000	-0.015	0.044
-0.015	-0.044	0.044	0.000	-0.044	0.015
0.000	0.000	0.000	0.000	0.000	0.000

- Introduction
- A development of τ^*

0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	1	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0

0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	1	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0

Introduction to Mazeinda

Introduction

A development of τ*

Computations in parallel

A development of τ^*

0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	1	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0