Fuv8
Fuv8

Reputation: 905

column paired statistic test

I have two data.frames that look like:

DF1      
  Col1     Col2     Col3    Col4    
 0.1854   0.1660   0.1997   0.4632
 0.1760   0.1336   0.1985   0.4496
 0.1737   0.1316   0.1943   0.4446    
 0.1660   0.1300   0.1896   0.4439


DF2       
  Col1     Col2     Col3    Col4    
 0.2456    0.2107   0.2688  0.5079
 0.2399    0.1952   0.2356  0.1143
 0.2375    0.1947   0.2187  0.0846    
 0.2368    0.1922   0.2087  0.1247

I would like to perform wilcox.test between the two data.frames and specifically between paired columns, so that:

test1: between Col1 of DF1 and Col1 of DF2     
test2: between Col2 of DF1 and Col2 of DF2      

and so on.

I used the following script:

for (i in 1:length(DF2)){ 
    test <- apply(DF1, 2, function(x) wilcox.test(x, as.numeric(DF2[[i]]), correct=TRUE))
}

Unfortunately the output of this script is different respect to the output of the same test performed using the following script:

test1 = wilcox.test(DF1[,1], DF2[,1],  correct=FALSE)     
test2 = wilcox.test(DF1[,2], DF2[,2],  correct=FALSE)       

Since in the real data.frames I have around 100 columns and 200 rows (they are equal respect to the dimension) I cannot make the test columns by columns.

After dput(DF1):

structure(list(Col1 = c(0.1854, 0.1760, 0.1737, 0.1660,....),  class = "data.frame", row.names = c(NA, -100L)))

The same for DF2

Upvotes: 3

Views: 109

Answers (2)

kith
kith

Reputation: 5566

It might be easier to loop over the column names instead with your for loop

for (name in colnames(DF2)){
    ...
    wilcox.test(DF1[,name], DF2[,name],  correct=FALSE))
    ...
}

Upvotes: 1

csgillespie
csgillespie

Reputation: 60462

This is a classic mapply case - basically just a multivariate version of sapply. We use mapply to go through each data frame in turn. First, create some data:

df1 = data.frame(c1 = runif(10), c2 = runif(10), c3 = runif(10), c4 = runif(10))
df2 = data.frame(c1 = runif(10), c2 = runif(10), c3 = runif(10), c4 = runif(10))

Then use mapply

l = mapply(wilcox.test, df1, df2, SIMPLIFY=FALSE, correct=FALSE)

Here the variable l is a list. So,

wilcox.test(df1[,1], df2[,1],  correct=FALSE) 
l[[1]]
wilcox.test(df1[,2], df2[,2],  correct=FALSE) 
l[[2]]

Upvotes: 6

Related Questions