Reputation: 3126

dplyr summarise multiple columns using t.test

Is it possible somehow to do a t.test over multiple variables against the same categorical variable without going through a reshaping of the dataset as follows?

data(mtcars)
library(dplyr)
library(tidyr)
j <- mtcars %>% gather(var, val, disp:qsec)
t <- j %>% group_by(var) %>% do(te = t.test(val ~ vs, data = .))

t %>% summarise(p = te$p.value)

I´ve tried using

mtcars %>% summarise_each_(funs = (t.test(. ~ vs))$p.value, vars = disp:qsec)

but it throws an error.

Bonus: How can t %>% summarise(p = te$p.value) also include the name of the grouping variable?

Upvotes: 21

Answers (4)

carfisma

Reputation: 307

I like the following solution using the powerful "broom" package:

library("dplyr")
library("broom")

your_db %>%
  group_by(grouping_variable1, grouping_variable2 ...) %>%
  do(tidy(t.test(variable_u_want_2_test ~ dicothomous_grouping_var, data = .)))

Upvotes: 14

Misha

Reputation: 3126

So I ended up hacking up a new function : df=dataframe , by_var=right hand side of formula, ... all variables on left hand side of formula (dplyr/tidyr select).

e.g: mult_t.test(mtcars,vs,disp:qsec)

mult_t.test<-function(df,by_var,...){
  require(dplyr)
  require(tidyr)
  by_var<-deparse(substitute(by_var))
  j<-df%>%gather(var,val,...)
  t<-j%>%group_by(var)%>%do(v=tes(.,by_var))
  k<-data.frame(levels(t$var),matrix(unlist(t$v),ncol=3,byrow = T))
  names(k)<-c("var",names(t$v[[1]]))
  k
}


tes<-function(df,vart){
  x<-t.test(df$val~df[[vart]])
  p<-x$estimate
  p<-c(p,p.val=x$p.value)
  p
}

Upvotes: 2

akhmed

Reputation: 3634

Realizing that the question is fairly old, here is another answer for the reference of future generations.

This is more general than the accepted answer since it allows for dynamically generated variable names rather than hard-coded.

vars_to_test <- c("disp","hp","drat","wt","qsec")
iv <- "vs"

mtcars %>%
  summarise_each_(
    funs_( 
      sprintf("stats::t.test(.[%s == 0], .[%s == 1])$p.value",iv,iv)
    ), 
    vars = vars_to_test)

which produces this:

          disp           hp       drat           wt         qsec
1 2.476526e-06 1.819806e-06 0.01285342 0.0007281397 3.522404e-06

The idea of this solution is to use SE versions of dplyr functions (summarise_each_ and funs_) instead of NSE versions (summarise_each and funs). For more information about Standard Evaluation (SE) and Non-Standard Evaluation (NSE), please check vignette("nse").

Upvotes: 6

jazzurro

Reputation: 23574

After all discussions with @aosmith and @Misha, here is one approach. As @aosmith wrote in his/her comments, You want to do the following.

mtcars %>%
    summarise_each(funs(t.test(.[vs == 0], .[vs == 1])$p.value), vars = disp:qsec)

#         vars1        vars2      vars3        vars4        vars5
#1 2.476526e-06 1.819806e-06 0.01285342 0.0007281397 3.522404e-06

vs is either 0 or 1 (group). If you want to run a t-test between the two groups in a variable (e.g., dips), it seems that you need to subset data as @aosmith suggested. I would like to say thank you for the contribution.

What I originally suggested works in another situation, in which you simply compare two columns. Here is sample data and codes.

foo <- data.frame(country = "Iceland",
                  year = 2014,
                  id = 1:30,
                  A = sample.int(1e5, 30, replace = TRUE),
                  B = sample.int(1e5, 30, replace = TRUE),
                  C = sample.int(1e5, 30, replace = TRUE),
                  stringsAsFactors = FALSE)

If you want to run t-tests for the A-C, and B-C combination, the following would be one way.

foo2 <- foo %>%
        summarise_each(funs(t.test(., C, pair = TRUE)$p.value), vars = A:B) 

names(foo2) <- colnames(foo[4:5])

#          A         B
#1 0.2937979 0.5316822

Upvotes: 19

dplyr summarise multiple columns using t.test

Answers (4)

Related Questions