kz1818
kz1818

Reputation: 13

Performing T test by columns

data

As per the dataset above, I have one column named "group" which specifies data on each member of group type p or group type f. I want to perform a t test and extract p values for the two groups for each variable (1,2...x). I know how to perform a t test on a single column/variable as shown with the code below.

 t.test(T1_All[[Variable 1]] ~ Group, T1_All, var.equal = TRUE)

Note: T1_All is the name of my dataset

What I want to do is perform a t test on each column variable using apply() so I won't need to do the t test 96 times for every one of my variables. Here is my shoddy attempt at a solution

apply(T1_All, 2, function(x) t.test(T1_All[[x]] ~ Group, T1_All, var.equal = TRUE)) 

And here is the error message

Error in t.test.formula(T1_All[[i]] ~ Group, T1_All) : 

grouping factor must have exactly 2 levels

apply(T1_All, 2, function(x) t.test(T1_All[[x]]~Group, T1_All)) Show Traceback

Rerun with Debug Error in .subset2(x, i) : no such index at level 1

-end code-

Furthermore, I would like to place the values outputted by the t test (p value, average values for each variable etc) and place them in a separate table when the apply function is used on the data frame. I have read some other posts on the tidy package but I'm still not sure how to approach this problem.

I have very little coding experience so any help would be appreciated. Thank you!

Upvotes: 0

Views: 252

Answers (2)

kstew
kstew

Reputation: 1114

You can use a combination of dplyr and plyr calls to do the t-tests and extract the p-values to a tidy data frame.

T1_All <- data.frame(Group=sample(c('p','f'),100,T),matrix(rnorm(1000),ncol=10))

T1_All %>% gather(k,v,-Group) %>% 
  ddply(.,.(k),function(x) t.test(x$v~x$Group)$p.value)

     k         V1
1   X1 0.99792904
2  X10 0.96577838
3   X2 0.31467877
4   X3 0.58195417
5   X4 0.41397033
6   X5 0.86034057
7   X6 0.08868437
8   X7 0.53494848
9   X8 0.73073014
10  X9 0.18215440

Upvotes: 0

user11937744
user11937744

Reputation:

An option would be lapply. Get the names of the data other than the 'Group', loop through those in lapply, create the formula with paste and apply the t.test

vec <- setdiff(names(T1_All), "Group")
lapply(vec, function(x) t.test(as.formula(paste0(x,  '~ Group')), 
         T1_All, var.equal = TRUE))

data

set.seed(2)
T1_All <- data.frame(Group = rep(c("P", "f"), each = 10), Measurement1 = rnorm(20), Measurement2 = rnorm(20) )

Upvotes: 1

Related Questions