cat cat
cat cat

Reputation: 65

conduct Wilcoxon-test and t-test by using for loop for all columns at once

I have a data frame with many columns. The first column contains categories such as "System 1", "System 2", and the second column has numbers that represent the 0's and 1's. Please see below :

For example:

SYSTEM Q1 Q2
System 1 0 1
System 1 1 0
System 2 1 1
System 2 0 0
System 2 1 1

How to write R code to run a paired Wilcoxon test for multiple columns from 1 to 100 using a for loop or other recommended solutions.

Here is my data

x<-"SYSTEM  Q1  Q2 Q3 Q4 Q5
S1  0   1    0   0  0   
S1  1   0    1   1  1
S2  1   1    1   1  1     
S2  0   0    1   1   0
S2  1   1   0    0  0"
mydata <- read.table(textConnection(x), header = TRUE)

n <- 1e4
df2 <- data.frame(
  SYSTEM = sample(mydata$SYSTEM, n, TRUE),
  Q1 = sample(mydata$Q1, n, TRUE),
  Q2 = sample(mydata$Q2, n, TRUE), 
  Q3 = sample(mydata$Q3, n, TRUE),
  Q4 = sample(mydata$Q4, n, TRUE),
  Q5 = sample(mydata$Q5, n, TRUE)
)

Upvotes: 2

Views: 410

Answers (1)

jay.sf
jay.sf

Reputation: 73437

You may use a formula and update in each iteration.

fo <- x ~ SYSTEM

t(sapply(names(df2[-1]), \(x) {
  wt <- wilcox.test(update(fo, paste(x, '~ .')), df2, paired=TRUE)[c('statistic', 'p.value')]
  tt <- t.test(update(fo, paste(x, '~ .')), df2, paired=TRUE)[c('statistic', 'p.value')]
  unlist(c(w=wt, t=tt))
}))
#    w.statistic.V w.p.value t.statistic.t t.p.value
# Q1       1545406 0.2021674    -1.2754967 0.2021928
# Q2       1592619 0.6752919     0.4188777 0.6753235
# Q3       1544422 0.8408638     0.2007856 0.8408744
# Q4       1572435 0.4583192     0.7416000 0.4583646
# Q5       1634352 0.2405299     1.1737235 0.2405617

Notice that your example data is flawed; S1 and S2 in SYSTEM need to be of equal size to run a paired test. Look into str(wilcox.test(.)) and str(t.test(.)) of a single run in case you want to include more stuff in c('statistic', 'p.value')]. Also consider @IRTFM's suggestion to take multiple comparisons into account, such as Bonferroni or FDR.


Data:

m <- 1e4; n <- 5
set.seed(42)
df2 <- data.frame(SYSTEM=rep(c('S1', 'S2'), each=n/2), matrix(sample(0:1, m*n, replace=TRUE), m, n))
names(df2)[-1] <- paste0('Q', 1:n)

Upvotes: 1

Related Questions