Reputation: 108

running t.test() on multiple columns to output tibble

I have a data frame as follows

record_id   group      enzyme1     enzyme2  ... ... 
            <factor>   <dbl>       <dbl>    ... ... 
1           control    34.5        32.3     ... ...
2           control    32.1        34.1     ... ...
3           treatment  123.1       12.1     ... ...

Basically a grouping variable called group, multiple dependent variables like enzyme1 etc.

I can run a t-test and wrap it into a tibble as follows:

tidy(t.test(enzyme1 ~ group))

I want to basically stack all the t test output on top of each other to look something like this

              estimate   statistic  p.value  parameter  conf.low   conf.high
enzyme 1      197.7424   0.3706244  0.7119  75.3982  -865.0291  1260.514
enzyme 2      XXX.XX     X.xxx      0.XXXX  XX.XXXX  -XX.XXX    XX.XXX

and so on.

any ideas?

Upvotes: 1

Answers (4)

akuiper

Reputation: 215117

Could also try a tidyverse approach like this:

df %>% 
    summarise_at(vars(starts_with('enzyme')), funs(list(tidy(t.test(. ~ group))))) %>% 
    map(1) %>% bind_rows(.id='enzymes')

#  enzymes estimate estimate1 estimate2 statistic    p.value parameter   conf.low conf.high                  method alternative
#1 enzyme1   -104.3      33.3     137.6 -7.168597 0.08610502  1.013697 -283.37000  74.77000 Welch Two Sample t-test   two.sided
#2 enzyme2     19.6      33.2      13.6 11.204574 0.01532388  1.637394   10.22717  28.97283 Welch Two Sample t-test   two.sided

Data:

df <- read.table(text = "record_id   group      enzyme1     enzyme2
1           control    34.5        32.3
2           control    32.1        34.1
3           treatment  123.1       12.1  
4           treatment  152.1       15.1  ", header=T)

Upvotes: 2

Flo.P

Reputation: 371

By using map to compute all the tests and then reduce for binding them:

 df <- data.frame(record_id = 1:50, group = sample(c("control", "treatment"), 50, replace = TRUE), 
             enzyme1 = rnorm(50),
             enzyme2 = rnorm(50))

library(broom)
library(dplyr)
library(purrr)
map(paste0("enzyme", 1:2), ~tidy(t.test(as.formula(paste0(.x, "~ group")), 
data = df))) %>% 
reduce(bind_rows)

Upvotes: 2

Andrew Haynes

Reputation: 2640

You can create an empty data.frame and then use rbind() to add your information to it in a loop.

Here's an example using the iris dataset:

df=data.frame()
for(i in 1:(length(colnames(iris))-1)){ ##change your length to whatever colnames you have

  variableName = colnames(iris)[i] ##loop through the desired colnames

  df<-rbind(df,cbind(variableName, tidy(t.test(Petal.Width~Species,data=iris[1:99,]))))

}

Upvotes: 0

Nate

Reputation: 10671

We could take advantage of purrr::map_df(), which is in library(tidyverse), like this:

library(broom)
library(tidyverse) # purrr is in here
data(mtcars)

#reproducible data to simulate your case
mtcars2 <- filter(mtcars, cyl %in% c(4, 6)) 
mtcars2$cyl <- as.factor(mtcars2$cyl)

# capture the columns you want to t.test
cols_not_cyl <- names(mtcars2)[-2]

# turn those column names into formulas
formulas <- paste(cols_not_cyl, "~ cyl") %>%
    map(as.formula) %>% # needs to be class formula
    set_names(cols_not_cyl) # useful for map_df()

# do the tests, then stack them all together
map_df(formulas, ~ tidy(t.test(formula = ., data = mtcars2)),
       .id = "column_id")

Upvotes: 4

running t.test() on multiple columns to output tibble

Answers (4)

Related Questions