column paired statistic test

Question

I have two data.frames that look like:

DF1      
  Col1     Col2     Col3    Col4    
 0.1854   0.1660   0.1997   0.4632
 0.1760   0.1336   0.1985   0.4496
 0.1737   0.1316   0.1943   0.4446    
 0.1660   0.1300   0.1896   0.4439


DF2       
  Col1     Col2     Col3    Col4    
 0.2456    0.2107   0.2688  0.5079
 0.2399    0.1952   0.2356  0.1143
 0.2375    0.1947   0.2187  0.0846    
 0.2368    0.1922   0.2087  0.1247

I would like to perform wilcox.test between the two data.frames and specifically between paired columns, so that:

test1: between Col1 of DF1 and Col1 of DF2     
test2: between Col2 of DF1 and Col2 of DF2

and so on.

I used the following script:

for (i in 1:length(DF2)){ 
    test <- apply(DF1, 2, function(x) wilcox.test(x, as.numeric(DF2[[i]]), correct=TRUE))
}

Unfortunately the output of this script is different respect to the output of the same test performed using the following script:

test1 = wilcox.test(DF1[,1], DF2[,1],  correct=FALSE)     
test2 = wilcox.test(DF1[,2], DF2[,2],  correct=FALSE)

Since in the real data.frames I have around 100 columns and 200 rows (they are equal respect to the dimension) I cannot make the test columns by columns.

After dput(DF1):

structure(list(Col1 = c(0.1854, 0.1760, 0.1737, 0.1660,....),  class = "data.frame", row.names = c(NA, -100L)))

The same for DF2

csgillespie · Accepted Answer

This is a classic mapply case - basically just a multivariate version of sapply. We use mapply to go through each data frame in turn. First, create some data:

df1 = data.frame(c1 = runif(10), c2 = runif(10), c3 = runif(10), c4 = runif(10))
df2 = data.frame(c1 = runif(10), c2 = runif(10), c3 = runif(10), c4 = runif(10))

Then use mapply

l = mapply(wilcox.test, df1, df2, SIMPLIFY=FALSE, correct=FALSE)

Here the variable l is a list. So,

wilcox.test(df1[,1], df2[,1],  correct=FALSE) 
l[[1]]
wilcox.test(df1[,2], df2[,2],  correct=FALSE) 
l[[2]]

column paired statistic test

Answers (2)

Related Questions