Reputation: 38652
My question is related to this one but a more complex example, in which I would like to statistically compare multiple columns in all combinations, and each of the columns has a different number of samples.
Consider the original data:
# A tibble: 51 x 3
trial person score
<chr> <chr> <dbl>
1 foo a 0.266
2 bar b 0.372
3 foo c 0.573
4 bar a 0.908
5 foo b 0.202
6 bar c 0.898
7 foo a 0.945
8 bar b 0.661
9 foo c 0.629
10 foo b 0.206
For each trial type, I'd like to run a statistical test comparing the scores of each person. So, I need the following test results:
foo
, compare all score
samples of persons A–B, B–C, C–Abar
, compare all score
samples of persons A–B, B–C, C–AOf course, there are more than two trials, and more than three persons.
Hence, the solution using group_split
given in the other question does not work, as it implies always testing agains the first person (in my case), not all pairwise combinations.
So, in the following code, I'm stuck at two points:
library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
library(broom)
set.seed(1)
df = tibble::tibble(
trial = rep(c("foo", "bar"), 30),
person = rep(c("a", "b", "c"), 20),
score = runif(60)
) %>%
filter(score > 0.2)
df %>%
group_by(person, trial) %>%
summarize(scores = list(score)) %>%
spread(person, scores) %>%
group_split(trial) %>%
map_df(function(data) {
data %>%
summarize_at(vars(b:c), function(x) {
wilcox.test(.$a, x, paired = FALSE) %>% broom::tidy
})
})
#> Error in wilcox.test.default(.$a, x, paired = FALSE): 'x' must be numeric
Created on 2019-05-29 by the reprex package (v0.3.0)
The value of x
is apparently not just the actual list of scores, but the column vector of scores for a single trial. But I don't know how else to deal with the fact that the number of samples in each person is different.
Also, I still have to manually specify the column names, which would already be a combinatorial nightmare if there were more than, say, four persons.
I can somehow get the combinations as such:
df %>%
group_split(trial) %>%
map_df(function(data) {
combinations = expand(tibble(x = unique(data$person), y = unique(data$person)), x, y) %>% filter(x != y)
})
… but that doesn't really help in creating columns for comparison.
What could I do to make this work?
Upvotes: 2
Views: 1409
Reputation: 3690
Here is an alternative solution that uses nesting to handle groups (persons) with different number of measurements.
library("broom")
library("tidyverse")
set.seed(1)
df <-
tibble(
trial = rep(c("foo", "bar"), 30),
person = rep(c("a", "b", "c"), 20),
score = runif(60)
) %>%
filter(score > 0.2)
comparisons <- df %>%
expand(
trial,
group1 = person,
group2 = person
) %>%
filter(
group1 < group2
)
comparisons
#> # A tibble: 6 × 3
#> trial group1 group2
#> <chr> <chr> <chr>
#> 1 bar a b
#> 2 bar a c
#> 3 bar b c
#> 4 foo a b
#> 5 foo a c
#> 6 foo b c
df <- df %>% nest_by(trial, person)
df
#> # A tibble: 6 × 3
#> # Rowwise: trial, person
#> trial person data
#> <chr> <chr> <list<tibble[,1]>>
#> 1 bar a [8 × 1]
#> 2 bar b [8 × 1]
#> 3 bar c [8 × 1]
#> 4 foo a [9 × 1]
#> 5 foo b [9 × 1]
#> 6 foo c [9 × 1]
comparisons %>%
inner_join(
df, by = c("trial", "group1" = "person")
) %>%
inner_join(
df, by = c("trial", "group2" = "person")
) %>%
mutate(
p.value = map2_dbl(
data.x, data.y, ~ wilcox.test(.x$score, .y$score)$p.value
)
)
#> # A tibble: 6 × 6
#> trial group1 group2 data.x data.y p.value
#> <chr> <chr> <chr> <list<tibble[,1]>> <list<tibble[,1]>> <dbl>
#> 1 bar a b [8 × 1] [8 × 1] 0.878
#> 2 bar a c [8 × 1] [8 × 1] 1
#> 3 bar b c [8 × 1] [8 × 1] 0.959
#> 4 foo a b [9 × 1] [9 × 1] 1
#> 5 foo a c [9 × 1] [9 × 1] 1
#> 6 foo b c [9 × 1] [9 × 1] 0.863
Created on 2022-03-17 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 10671
This will allow you to programmatically specify combinations and get around the error you were hitting in wilcox.test()
.
combos <- unique(df$person) %>%
combn(2, simplify = F) %>%
set_names(map_chr(., ~ paste(., collapse = "_")))
df %>%
group_split(trial) %>%
set_names(map_chr(., ~ unique(.$trial))) %>%
map_df(function(x) {
map_df(combos, function(y) {
filter(x, person %in% y) %>%
wilcox.test(score ~ person, data = .) %>%
broom::tidy()
}, .id = "contrast")
}, .id = "trial")
# A tibble: 6 x 6
trial contrast statistic p.value method alternative
<chr> <chr> <dbl> <dbl> <chr> <chr>
1 bar a_b 34 0.878 Wilcoxon rank sum test two.sided
2 bar a_c 32 1 Wilcoxon rank sum test two.sided
3 bar b_c 31 0.959 Wilcoxon rank sum test two.sided
4 foo a_b 41 1 Wilcoxon rank sum test two.sided
5 foo a_c 41 1 Wilcoxon rank sum test two.sided
6 foo b_c 43 0.863 Wilcoxon rank sum test two.sided
Since this differs a lot from the pattern you started with, I'm not sure it will work for your real world case, but it works here so I wanted to share.
Upvotes: 1