mihagazvoda
mihagazvoda

Reputation: 1377

Group by and pairwise hypothesis testing with the first group on the same column

Let's say I have mtcars dataset with columns mpg and cyl.

mpg cyl
21.0   6
21.0   6
22.8   4
21.4   6
18.7   8
18.1   6

I would like to calculate all t.test() (or wilcox.test()) statistics between group where cyl == 4 and others groups. Results should be a tibble that looks like:

mpg_4 <- mtcars %>% filter(cyl == 4) %>% select(mpg)
mpg_6 <- mtcars %>% filter(cyl == 6) %>% select(mpg)
mpg_8 <- mtcars %>% filter(cyl == 8) %>% select(mpg)

bind_rows(
  broom::tidy(t.test(mpg_4, mpg_4)), 
  broom::tidy(t.test(mpg_4, mpg_6)), 
  broom::tidy(t.test(mpg_4, mpg_)
  )

I would like to do it using it purrr and broom unless there's a cleaner way. Please note that it should work for n groups and it should be applicable to easily changed to a different test.

Upvotes: 1

Views: 84

Answers (1)

Artem Sokolov
Artem Sokolov

Reputation: 13731

First, we isolate vectors of mpg values for each cyl into their own list elements:

X <- mtcars %>% group_by(cyl) %>% summarize_at("mpg", list) %>% deframe
# $`4`
#  [1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4

# $`6`
# [1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7

# $`8`
#  [1] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 13.3 19.2 15.8 15.0

We then compute t.test for each element of the list against the first and combine results into a data frame:

map( X, t.test, X[["4"]] ) %>% map( broom::tidy ) %>% bind_rows( .id = "cyl" )
# # A tibble: 3 x 11
#   cyl   estimate estimate1 estimate2 statistic    p.value parameter conf.low conf.high method                  alternative
#   <chr>    <dbl>     <dbl>     <dbl>     <dbl>      <dbl>     <dbl>    <dbl>     <dbl> <chr>                   <chr>      
# 1 4         0         26.7      26.7      0    1               20      -4.01      4.01 Welch Two Sample t-test two.sided  
# 2 6        -6.92      19.7      26.7     -4.72 0.000405        13.0   -10.1      -3.75 Welch Two Sample t-test two.sided  
# 3 8       -11.6       15.1      26.7     -7.60 0.00000164      15.0   -14.8      -8.32 Welch Two Sample t-test two.sided  

Upvotes: 1

Related Questions