Reputation: 312

R: t test over multiple columns using t.test function

I tried to perform independent t-test for many columns of a dataframe. For example, i created a data frame

set seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)

To run the test, i used with(df, t.test(y ~ group))

with(test_data, t.test(a ~ grp))
with(test_data, t.test(b ~ grp))
with(test_data, t.test(c ~ grp))

I would like to have the outputs like this

mean in group m mean in group y  p-value
9.747412        9.878820         0.6944
15.12936        16.49533         0.07798 
20.39531        20.20168         0.9027

I wonder how can I achieve the results using 1. for loop 2. apply() 3. perhaps dplyr

This link R: t-test over all columns is related but it was 6 years old. Perhaps there are better ways to do the same thing.

Upvotes: 7

Answers (5)

Jia Gao

Reputation: 1292

This should be a comment rather than an answer, but I'll make it an answer. The reason is that the accepted answer is awesome but with one caveat that may cost others hours, which is at least the case for me. The original data posted by OP

a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)

The answer provided by @Tung

library(tidyverse)

res <- test_data %>% 
  select_if(is.numeric) %>%
  map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res

The problem, or more accurately, the caveat, of this answer is that one has to define the grp variable separately. Having the group variable outside of the dataframe is not a common practice as far as I know. So, even the answer is neat, it may be better to point out this operation (define group variable outside of the dataframe). Therefore, I use this comment like answer in the hope to save some time for those late comers.

Upvotes: 1

Tung

Reputation: 28431

Use select_if to select only numeric columns then use purrr:map_df to apply t.test against grp. Finally use broom:tidy to get the results in tidy format

library(tidyverse)

res <- test_data %>% 
  select_if(is.numeric) %>%
  map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res
#> # A tibble: 3 x 11
#>   var   estimate estimate1 estimate2 statistic p.value parameter conf.low
#>   <chr>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
#> 1 a       -0.259      9.78      10.0    -0.587   0.565      16.2    -1.19
#> 2 b        0.154     15.0       14.8     0.169   0.868      15.4    -1.78
#> 3 c       -0.359     20.4       20.7    -0.287   0.778      16.5    -3.00
#> # ... with 3 more variables: conf.high <dbl>, method <chr>,
#> #   alternative <chr>

^{Created on 2019-03-15 by the reprex package (v0.2.1.9000)}

Upvotes: 8

Parfait

Reputation: 107747

Simply extract the estimate and p-value results from t.test call while iterating through all needed columns with sapply. Build formulas from a character vector and transpose with t() for output:

formulas <- paste(names(test_data)[1:(ncol(test_data)-1)], "~ grp")

output <- t(sapply(formulas, function(f) {      
  res <- t.test(as.formula(f))
  c(res$estimate, p.value=res$p.value)      
}))

Input data (seeded for reproducibility)

set.seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)

Output result

#         mean in group m mean in group y   p.value
# a ~ grp        9.775477        10.03419 0.5654353
# b ~ grp       14.972888        14.81895 0.8678149
# c ~ grp       20.383679        20.74238 0.7776188

Upvotes: 4

DaWassi

Reputation: 118

As you asked for a for loop:

  a <- rnorm(20, 10, 1)
  b <- rnorm(20, 15, 2)
  c <- rnorm(20, 20, 3)
  grp <- rep(c('m', 'y'),10)
  test_data <- data.frame(a, b, c, grp)  

  meanM=NULL
  meanY=NULL
  p.value=NULL

  for (i in 1:(ncol(test_data)-1)){
    meanM=as.data.frame(rbind(meanM, t.test(test_data[,i] ~ grp)$estimate[1]))
    meanY=as.data.frame(rbind(meanY, t.test(test_data[,i] ~ grp)$estimate[2]))
    p.value=as.data.frame(rbind(p.value, t.test(test_data[,i] ~ grp)$p.value))
   }

  cbind(meanM, meanY, p.value)

It works, but I am a beginner in R. So maybe there is a more efficient solution

Upvotes: 2

Rui Barradas

Reputation: 76651

Using lapply this is rather easy.
I have tested the code with set.seed(7060) before creating the dataset, in order to make the results reproducible.

tests_list <- lapply(letters[1:3], function(x) t.test(as.formula(paste0(x, "~ grp")), data = test_data))

result <- do.call(rbind, lapply(tests_list, `[[`, "estimate"))
pval <- sapply(tests_list, `[[`, "p.value")
result <- cbind(result, p.value = pval)

result
#     mean in group m mean in group y   p.value
#[1,]        9.909818        9.658813 0.6167742
#[2,]       14.578926       14.168816 0.6462151
#[3,]       20.682587       19.299133 0.2735725

Note that a real life application would use names(test_data)[1:3], not letters[1:3], in the first lapply instruction.

Upvotes: 2

R: t test over multiple columns using t.test function

Answers (5)

Related Questions