jcrunden
jcrunden

Reputation: 11

Run wilcoxon rank sum test on each row of a data frame

I have a large set of biological data in a data frame, as below. Each row has the condition, identifier (plate and well) and 3 replicates of Expected Phenotype (EP) and Observed Phenotype (OP).

I want to add a column with the p value of a Wilcoxon rank sum test, testing whether EP and OP are significantly different from each other for each row/Well.

head(df)

  Temp Plate Well      EP1      EP2      EP3    OP1    OP2    OP3
1 30°C    31  A01 1.395874 1.323633 1.130804 0.1352 0.1632 0.1130
2 30°C    31  A02 1.449596 1.501810 1.111663 1.1474 1.1314 1.0628
3 30°C    31  A03 1.332983 1.416245 1.081833 1.0604 1.0947 1.0790
4 30°C    31  A04 1.333371 1.556057 1.091200 0.9786 1.0009 1.0127
5 30°C    31  A05 1.362556 1.343878 1.042433 1.0152 1.0534 1.0143
6 30°C    31  A06 1.542448 1.430897 1.031030 1.0266 1.0076 0.9785

I've found these posts: Run a wilcox function for each row in each group and Trying to run many anovas and get an F value for each row but I can't seem to put them together and make a script that works. I'm finding the mapply() function completely impenetrable in the first link, and I can't work out how to get the Wilcox test instead of the f.stat in the second link.

Any help would be so appreciated. Thanks!

Upvotes: 0

Views: 1229

Answers (1)

dcarlson
dcarlson

Reputation: 11056

First let's put the data in an easier format for R by using dput(head(df)):

df <- structure(list(Temp = c("30°C", "30°C", "30°C", "30°C", "30°C", 
"30°C"), Plate = c(31L, 31L, 31L, 31L, 31L, 31L), Well = c("A01", 
"A02", "A03", "A04", "A05", "A06"), EP1 = c(1.395874, 1.449596, 
1.332983, 1.333371, 1.362556, 1.542448), EP2 = c(1.323633, 1.50181, 
1.416245, 1.556057, 1.343878, 1.430897), EP3 = c(1.130804, 1.111663, 
1.081833, 1.0912, 1.042433, 1.03103), OP1 = c(0.1352, 1.1474, 
1.0604, 0.9786, 1.0152, 1.0266), OP2 = c(0.1632, 1.1314, 1.0947, 
1.0009, 1.0534, 1.0076), OP3 = c(0.113, 1.0628, 1.079, 1.0127, 
1.0143, 0.9785)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6"))

Now the wilcox.test on a single row is

wilcox.test(unlist(df[1, 4:6]), unlist(df[1, 7:9]))
# 
#   Wilcoxon rank sum exact test
# 
# data:  unlist(df[1, 4:6]) and unlist(df[1, 7:9])
# W = 9, p-value = 0.1
# alternative hypothesis: true location shift is not equal to 0

To get just the p-value:

wilcox.test(unlist(df[1, 4:6]), unlist(df[1, 7:9]))$p.value
# [1] 0.1

So we can use apply() to get all of the rows:

p <- apply(df[, 4:9], 1, function(x) wilcox.test(x[1:3], x[4:6])$p.value)
p
#   1   2   3   4   5   6 
# 0.1 0.4 0.2 0.1 0.2 0.1 

Upvotes: 1

Related Questions