Reputation: 1377
Let's say I have mtcars
dataset with columns mpg
and cyl
.
mpg cyl
21.0 6
21.0 6
22.8 4
21.4 6
18.7 8
18.1 6
I would like to calculate all t.test()
(or wilcox.test()
) statistics between group where cyl == 4
and others groups. Results should be a tibble that looks like:
mpg_4 <- mtcars %>% filter(cyl == 4) %>% select(mpg)
mpg_6 <- mtcars %>% filter(cyl == 6) %>% select(mpg)
mpg_8 <- mtcars %>% filter(cyl == 8) %>% select(mpg)
bind_rows(
broom::tidy(t.test(mpg_4, mpg_4)),
broom::tidy(t.test(mpg_4, mpg_6)),
broom::tidy(t.test(mpg_4, mpg_)
)
I would like to do it using it purrr
and broom
unless there's a cleaner way. Please note that it should work for n groups and it should be applicable to easily changed to a different test.
Upvotes: 1
Views: 84
Reputation: 13731
First, we isolate vectors of mpg
values for each cyl
into their own list elements:
X <- mtcars %>% group_by(cyl) %>% summarize_at("mpg", list) %>% deframe
# $`4`
# [1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4
# $`6`
# [1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7
# $`8`
# [1] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 13.3 19.2 15.8 15.0
We then compute t.test for each element of the list against the first and combine results into a data frame:
map( X, t.test, X[["4"]] ) %>% map( broom::tidy ) %>% bind_rows( .id = "cyl" )
# # A tibble: 3 x 11
# cyl estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
# 1 4 0 26.7 26.7 0 1 20 -4.01 4.01 Welch Two Sample t-test two.sided
# 2 6 -6.92 19.7 26.7 -4.72 0.000405 13.0 -10.1 -3.75 Welch Two Sample t-test two.sided
# 3 8 -11.6 15.1 26.7 -7.60 0.00000164 15.0 -14.8 -8.32 Welch Two Sample t-test two.sided
Upvotes: 1