chopin_is_the_best
chopin_is_the_best

Reputation: 2101

Calculate p-values in a for loop

I have a dataset containing experiment data. Each day I have new observation coming in.

A fictional example of my df with columns: day: day index group a: data control group b: data treatment.

structure(list(day = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), group_a = c(4L, 
2L, 3L, 1L, 1L, 4L, 3L, 2L, 4L), group_b = c(3L, 4L, 2L, 2L, 
2L, 2L, 3L, 4L, 5L)), .Names = c("day", "group_a", "group_b"), class = "data.frame", row.names = c(NA, 
-9L))

I want to subset this dataset, apply a wilcoxon signed rank test like:

test <- wilcox.test(df$group_a, df$group_b, alternative = 'g')
test$p.value

In this example I apply the test over the whole dataset.

I want to apply it on day 1, then day 1 and 2 and so on, finally getting a list looking like (fictional data):

day p-value
1   0.02
2   0.03
3   0.3

How can I apply the test in a for loop over "day", but on "cumulative" amount of days?

Upvotes: 4

Views: 1469

Answers (3)

Jaap
Jaap

Reputation: 83215

Using:

for (i in unique(df$day)) {
  df$p.val[df$day == i] <- wilcox.test(df[df$day %in% 1:i,]$group_a, df[df$day %in% 1:i,]$group_b, alternative = 'g')$p.value
}

you get:

> df
  day group_a group_b     p.val
1   1       4       3 0.7928919
2   1       2       4 0.7928919
3   2       3       2 0.7768954
4   2       1       2 0.7768954
5   2       1       2 0.7768954
6   3       4       2 0.7084401
7   3       3       3 0.7084401
8   3       2       4 0.7084401
9   3       4       5 0.7084401

Or when you just want to get the three p-values in a summarized dataframe:

vec <- sapply(unique(df$day), 
              function(i) wilcox.test(df[df$day %in% 1:i,]$group_a, 
                                      df[df$day %in% 1:i,]$group_b,
                                      alternative = 'g')$p.value)

df2 <- data.frame(day = unique(df$day), p.val = vec)

which gives:

> df2
  day     p.val
1   1 0.7928919
2   2 0.7768954
3   3 0.7084401

Upvotes: 3

amatsuo_net
amatsuo_net

Reputation: 2448

This also works:

library(data.table)
setDT(df)
test_pvals <- sapply(as.list(unique(df[, day])), function(x){
  df[day <= x, wilcox.test(group_a, group_b, alternative = 'g')$p.val]
})
data.table(day = df[, unique(day)], p.val = test_pvals)
##    day     p.val
## 1:   1 0.7928919
## 2:   2 0.7768954
## 3:   3 0.7084401

Upvotes: 0

Sotos
Sotos

Reputation: 51582

You can use Reduce with accumulate = TRUE,

p_value <- do.call(rbind, lapply(Reduce(rbind, split(df, df$day), accumulate = TRUE), 
               function(i) wilcox.test(i$group_a, i$group_b, alternative = 'g')$p.value))
p_value
#          [,1]
#[1,] 0.7928919
#[2,] 0.7768954
#[3,] 0.7084401

Tidy the output,

final_df <- data.frame(day = unique(df$day), p_value)

final_df
#  day   p_value
#1   1 0.7928919
#2   2 0.7768954
#3   3 0.7084401

Upvotes: 1

Related Questions