Ib Nemer
Ib Nemer

Reputation: 85

How to do Wilcoxon test on columns between two dataframes

I have two dataframes:

D9 <- as.data.frame(DF$As,DF$Cd,DF$Cu,DF$Cr,DF$Ni,DF$Pb,DF$Zn)
D10 <- as.data.frame(DO$As,DO$Cd,DO$Cu,DO$Cr,DO$Ni,DO$Pb,DO$Zn)

And want to apply wilcox test on each columns (DF$As,DO$As) and so on. I tried the following code:

lapply(ncol(D9), function(i) {wilcox.test((D9[,i]),(D10[,i]))})

And the output was:

[[1]]
Wilcoxon rank sum test
data:  (D9[, i]) and (D10[, i])
W = 107, p-value = 0.9834
alternative hypothesis: true location shift is not equal to 0

So my question is - what am I doing wrong?

Any help is appreciated.

Upvotes: 0

Views: 3103

Answers (4)

Karolis Koncevičius
Karolis Koncevičius

Reputation: 9656

Here is alternative using a package, doing wilcox test between iris first columns 1-2 and columns 3-4.

library(matrixTests)
col_wilcoxon_twosample(iris[,1:2], iris[,3:4])

             obs.x obs.y obs.tot statistic       pvalue alternative location.null exact corrected
Sepal.Length   150   150     300     19249 1.702530e-26   two.sided             0 FALSE      TRUE
Sepal.Width    150   150     300     22362 1.295486e-49   two.sided             0 FALSE      TRUE

Upvotes: 0

Pkaksha
Pkaksha

Reputation: 21

We have two different data-frames d1 & d2 of sizes N observations of X variables and N observations of Y variables respectively.
For finding Wilcoxon-Matt-Whitney test between every columns of these two different data frames d1 & d2:
1. Reading data :

d1 <- data.frame(read.table("data1", header = TRUE, stringsAsFactors = FALSE, sep = ""))
d2 <- data.frame(read.table("data2", header = TRUE, stringsAsFactors = FALSE, sep = ""))

Assume number of columns in d1 is greater than number of columns in d2

length(colnames(d1)) >= length(colnames(d2))  

2. Declaring a Matrix to store p-values

pvalue <- matrix(nrow = length(colnames(d2)), ncol = (length(colnames(d1)))  

3. Now for Wilcoxon-Matt-Whitney test for each column of d2$1 with d1$1, d1$2, d1$3, ... and so on

for(i in 1:length(colnames(d2))){
  for(j in 1:length(colnames(d1))){
    pvalue[i,j]<-wilcox.test(d2[,i], d1[,j], paired=TRUE)$p.value 
    colnames(pvalue) <- colnames(d1)
    rownames(pvalue) <- colnames(d2)} }  

Note : This method will also works fine if we want to perform Wilcoxon-Matt-Whitney test on single data-frame to find relation between one column with another columns of same data-frame.

d3 <- data.frame(read.table("data3", header = TRUE, stringsAsFactors = FALSE, sep = ""))  
pvalue <- matrix(nrow = length(colnames(d3)), ncol = (length(colnames(d3)))

Now for Wilcoxon-Matt-Whitney test for each column of d3$1 with d3$1, d3$2, d3$3, ... and so on

for(i in 1:length(colnames(d3))){
  for(j in 1:length(colnames(d3))){
    pvalue[i,j]<-wilcox.test(d3[,i], d3[,j], paired=TRUE)$p.value 
    colnames(pvalue) <- colnames(d3)
    rownames(pvalue) <- colnames(d3)} }

Upvotes: 2

srhoades10
srhoades10

Reputation: 103

lapply needs a vector, so @MrFlick's suggesting will probably be of help (you actually only ran one wilcox test)

You could also get iterative printouts through a loop

for(i in 1:ncol(D9)){
    summary(wilcox.text(D9[,i],D10[,i]))
}

Upvotes: 0

MrFlick
MrFlick

Reputation: 206253

Note that ncol(D9) will only return a single number, so lapply will only iterate over that single number. Use 1:ncol(D9) to start at the first column (or use seq.int(ncol(D9)). See the difference between lapply(9, print) and lapply(1:9, print)

Alternative you can just map over the columns directly with

Map(wilcox.test, D9, D10)

since data.frames are really just lists.

Upvotes: 1

Related Questions