user5457414
user5457414

Reputation: 137

User defined function to do t-tests between two datasets

I'm a new user trying to figure out lapply.

I have two data sets with the same 30 variables in each, and I'm trying to run t-tests to compare the variables in each sample. My ideal outcome would be a table that lists each variable along with the t stat and the p-value for the difference in that variable between the two datasets.

I tried to design a function to do the t test, so that I could then use lapply. Here's my code with a reproducible example.

height<-c(2,3,4,2,3,4,5,6)
weight<-c(3,4,5,7,8,9,5,6)
location<-c(0,1,1,1,0,0,0,1)
data_test<-cbind(height,weight,location)
data_north<-subset(data_test,location==0)
data_south<-subset(data_test,location==1)
variables<-colnames(data_test)
compare_t_tests<-function(x){
  model<-t.test(data_south[[x]], data_north[[x]], na.rm=TRUE)
  return(summary(model[["t"]]), summary(model[["p-value"]]))
}
compare_t_tests(height)

which gets the error:

Error in data_south[[x]] : attempt to select more than one element 

My plan was to use the function in lapply like this, once I figure it out.

 lapply(variables, compare_t_tests)

I'd really appreciate any advice. It seems to me like I might not even be looking at this right, so redirection would also be welcome!

Upvotes: 2

Views: 208

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226162

You're very close. There are just a few tweaks:

Data:

height <- c(2,3,4,2,3,4,5,6)
weight <- c(3,4,5,7,8,9,5,6)
location <- c(0,1,1,1,0,0,0,1)

Use data.frame instead of cbind to get a data frame with real names ...

data_test <- data.frame(height,weight,location)
data_north <- subset(data_test,location==0)
data_south <- subset(data_test,location==1)

Don't include location in the set of variables ...

variables <- colnames(data_test)[1:2] ## skip location

Use the model, not the summary; return a vector

compare_t_tests<-function(x){
   model <- t.test(data_south[[x]], data_north[[x]], na.rm=TRUE)
   unlist(model[c("statistic","p.value")])
}

Compare with the variable in quotation marks, not as a raw symbol:

compare_t_tests("height")
## statistic.t     p.value 
##   0.2335497   0.8236578 

Using sapply will automatically collapse the results into a table:

sapply(variables,compare_t_tests)
##                height     weight
## statistic.t 0.2335497 -0.4931970
## p.value     0.8236578  0.6462352

You can transpose this (t()) if you prefer ...

Upvotes: 4

Related Questions