Reputation: 3126
Is it possible somehow to do a t.test over multiple variables against the same categorical variable without going through a reshaping of the dataset as follows?
data(mtcars)
library(dplyr)
library(tidyr)
j <- mtcars %>% gather(var, val, disp:qsec)
t <- j %>% group_by(var) %>% do(te = t.test(val ~ vs, data = .))
t %>% summarise(p = te$p.value)
I´ve tried using
mtcars %>% summarise_each_(funs = (t.test(. ~ vs))$p.value, vars = disp:qsec)
but it throws an error.
Bonus: How can t %>% summarise(p = te$p.value)
also include the name of the grouping variable?
Upvotes: 21
Views: 24006
Reputation: 307
I like the following solution using the powerful "broom" package:
library("dplyr")
library("broom")
your_db %>%
group_by(grouping_variable1, grouping_variable2 ...) %>%
do(tidy(t.test(variable_u_want_2_test ~ dicothomous_grouping_var, data = .)))
Upvotes: 14
Reputation: 3126
So I ended up hacking up a new function : df=dataframe , by_var=right hand side of formula, ... all variables on left hand side of formula (dplyr/tidyr select).
e.g: mult_t.test(mtcars,vs,disp:qsec)
mult_t.test<-function(df,by_var,...){
require(dplyr)
require(tidyr)
by_var<-deparse(substitute(by_var))
j<-df%>%gather(var,val,...)
t<-j%>%group_by(var)%>%do(v=tes(.,by_var))
k<-data.frame(levels(t$var),matrix(unlist(t$v),ncol=3,byrow = T))
names(k)<-c("var",names(t$v[[1]]))
k
}
tes<-function(df,vart){
x<-t.test(df$val~df[[vart]])
p<-x$estimate
p<-c(p,p.val=x$p.value)
p
}
Upvotes: 2
Reputation: 3634
Realizing that the question is fairly old, here is another answer for the reference of future generations.
This is more general than the accepted answer since it allows for dynamically generated variable names rather than hard-coded.
vars_to_test <- c("disp","hp","drat","wt","qsec")
iv <- "vs"
mtcars %>%
summarise_each_(
funs_(
sprintf("stats::t.test(.[%s == 0], .[%s == 1])$p.value",iv,iv)
),
vars = vars_to_test)
which produces this:
disp hp drat wt qsec
1 2.476526e-06 1.819806e-06 0.01285342 0.0007281397 3.522404e-06
The idea of this solution is to use SE versions of dplyr functions (summarise_each_
and funs_
) instead of NSE versions (summarise_each
and funs
). For more information about Standard Evaluation (SE) and Non-Standard Evaluation (NSE), please check vignette("nse")
.
Upvotes: 6
Reputation: 23574
After all discussions with @aosmith and @Misha, here is one approach. As @aosmith wrote in his/her comments, You want to do the following.
mtcars %>%
summarise_each(funs(t.test(.[vs == 0], .[vs == 1])$p.value), vars = disp:qsec)
# vars1 vars2 vars3 vars4 vars5
#1 2.476526e-06 1.819806e-06 0.01285342 0.0007281397 3.522404e-06
vs is either 0 or 1 (group). If you want to run a t-test between the two groups in a variable (e.g., dips), it seems that you need to subset data as @aosmith suggested. I would like to say thank you for the contribution.
What I originally suggested works in another situation, in which you simply compare two columns. Here is sample data and codes.
foo <- data.frame(country = "Iceland",
year = 2014,
id = 1:30,
A = sample.int(1e5, 30, replace = TRUE),
B = sample.int(1e5, 30, replace = TRUE),
C = sample.int(1e5, 30, replace = TRUE),
stringsAsFactors = FALSE)
If you want to run t-tests for the A-C, and B-C combination, the following would be one way.
foo2 <- foo %>%
summarise_each(funs(t.test(., C, pair = TRUE)$p.value), vars = A:B)
names(foo2) <- colnames(foo[4:5])
# A B
#1 0.2937979 0.5316822
Upvotes: 19