Reputation: 65
I have a data frame with many columns. The first column contains categories such as "System 1", "System 2", and the second column has numbers that represent the 0's and 1's. Please see below :
For example:
SYSTEM | Q1 | Q2 |
---|---|---|
System 1 | 0 | 1 |
System 1 | 1 | 0 |
System 2 | 1 | 1 |
System 2 | 0 | 0 |
System 2 | 1 | 1 |
How to write R code to run a paired Wilcoxon test for multiple columns from 1 to 100 using a for loop or other recommended solutions.
Here is my data
x<-"SYSTEM Q1 Q2 Q3 Q4 Q5
S1 0 1 0 0 0
S1 1 0 1 1 1
S2 1 1 1 1 1
S2 0 0 1 1 0
S2 1 1 0 0 0"
mydata <- read.table(textConnection(x), header = TRUE)
n <- 1e4
df2 <- data.frame(
SYSTEM = sample(mydata$SYSTEM, n, TRUE),
Q1 = sample(mydata$Q1, n, TRUE),
Q2 = sample(mydata$Q2, n, TRUE),
Q3 = sample(mydata$Q3, n, TRUE),
Q4 = sample(mydata$Q4, n, TRUE),
Q5 = sample(mydata$Q5, n, TRUE)
)
Upvotes: 2
Views: 410
Reputation: 73437
You may use a fo
rmula and update
in each iteration.
fo <- x ~ SYSTEM
t(sapply(names(df2[-1]), \(x) {
wt <- wilcox.test(update(fo, paste(x, '~ .')), df2, paired=TRUE)[c('statistic', 'p.value')]
tt <- t.test(update(fo, paste(x, '~ .')), df2, paired=TRUE)[c('statistic', 'p.value')]
unlist(c(w=wt, t=tt))
}))
# w.statistic.V w.p.value t.statistic.t t.p.value
# Q1 1545406 0.2021674 -1.2754967 0.2021928
# Q2 1592619 0.6752919 0.4188777 0.6753235
# Q3 1544422 0.8408638 0.2007856 0.8408744
# Q4 1572435 0.4583192 0.7416000 0.4583646
# Q5 1634352 0.2405299 1.1737235 0.2405617
Notice that your example data is flawed; S1 and S2 in SYSTEM need to be of equal size to run a paired test. Look into str(wilcox.test(.))
and str(t.test(.))
of a single run in case you want to include more stuff in c('statistic', 'p.value')]
. Also consider @IRTFM's suggestion to take multiple comparisons into account, such as Bonferroni or FDR.
Data:
m <- 1e4; n <- 5
set.seed(42)
df2 <- data.frame(SYSTEM=rep(c('S1', 'S2'), each=n/2), matrix(sample(0:1, m*n, replace=TRUE), m, n))
names(df2)[-1] <- paste0('Q', 1:n)
Upvotes: 1