jackson883
jackson883

Reputation: 71

how to apply group data t.test to multi column in r?

I plan to do t.test for q1 by group A&B in df

q1 q2 q3 group
1  0  1  A
0  1  0  B
1  1  1  A
0  1  0  B

Then the script is :

t.test(subset(df,group==A,select = c("q1")),subset(df,group==B,select = c("q1")),alternative = "two.sided")

I made a function for t.test script:

x<-function(qnum){t.test(subset(df,group==A,select = c("qnum")),subset(df,group==B,select = c("qnum")),alternative = "two.sided")}

Then i think apply can give me t.test result for q1,q2,q3...

y<-select(df,grep("q\\d",colnames(df),perl=TRUE))
apply(y,2,x)

but has error:

Error in `[.data.frame`(x, r, vars, drop = drop) :

how to automatic get the t.test result for multi columns?

Upvotes: 0

Views: 2686

Answers (1)

Simon Jackson
Simon Jackson

Reputation: 3184

You can handle this better using a formula in t.test(). For example, t.test(q1 ~ group, data = df).

Below I'll demonstrate by simulating data, using a formula, then using lapply() to run t.test() for every column (except group):

# Create data
set.seed(123)  # This makes sampling replicable
d <- data.frame(
  q1 = rnorm(20),
  q2 = rnorm(20),
  q3 = rnorm(20),
  group = sample(c("A", "B"), size = 20, replace = TRUE)
)

head(d)
#>            q1         q2         q3 group
#> 1 -0.56047565 -1.0678237 -0.6947070     B
#> 2 -0.23017749 -0.2179749 -0.2079173     A
#> 3  1.55870831 -1.0260044 -1.2653964     A
#> 4  0.07050839 -0.7288912  2.1689560     A
#> 5  0.12928774 -0.6250393  1.2079620     A
#> 6  1.71506499 -1.6866933 -1.1231086     B

# Example of using a formula
t.test(d$q1 ~ d$group)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  d$q1 by d$group
#> t = -0.76262, df = 17.323, p-value = 0.4559
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.2294678  0.5759458
#> sample estimates:
#> mean in group A mean in group B 
#>     -0.05443279      0.27232820


# How to apply t.test to every column with lapply()
# - d[,-4] is all data excluding `group` variable
lapply(d[,-4], function(i) t.test(i ~ d$group))
#> $q1
#> 
#>  Welch Two Sample t-test
#> 
#> data:  i by d$group
#> t = -0.76262, df = 17.323, p-value = 0.4559
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.2294678  0.5759458
#> sample estimates:
#> mean in group A mean in group B 
#>     -0.05443279      0.27232820 
#> 
#> 
#> $q2
#> 
#>  Welch Two Sample t-test
#> 
#> data:  i by d$group
#> t = -1.6467, df = 17.731, p-value = 0.1172
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -1.2881952  0.1568201
#> sample estimates:
#> mean in group A mean in group B 
#>      -0.3906697       0.1750179 
#> 
#> 
#> $q3
#> 
#>  Welch Two Sample t-test
#> 
#> data:  i by d$group
#> t = 0.52889, df = 13.016, p-value = 0.6058
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -0.7569843  1.2478547
#> sample estimates:
#> mean in group A mean in group B 
#>     0.253746354     0.008311147

Upvotes: 5

Related Questions