Reputation: 13
I have a dataset named 'dat' with 5 columns: month; mean0; sd0; mean1; sd1. It looks like the following (but with numbers):
month mean0 sd0 mean1 sd1
1
2
3
..
48
I would like to use an independent (not paired) t-test to compare mean0 and mean1 for every month between 1 and 48. Ideally, the output would be put in another dataframe, called 'dat1', with columns for: t-statisitc, degrees of freedom (DF); and a p-value. Like so:
month t-statistic DF p-value
1
2
3
..
48
I have tried using dplyr and broom packages, but cannot seem to figure it out. Any help would be appreciated.
Upvotes: 0
Views: 236
Reputation: 1392
You'll need the n values for both sd's as well. The tsum.test
function from the BSDA package will help you do the t-test without your having to write your own function.
There remains the larger question of the advisability of doing a large number of comparisons in this manner. This link provides information about that.
With that caveat, here's how to do what you want with some arbitrary data:
dat <- data.frame(m1=c(24,11,34),
sd1=c(1.3,4.2,2.3),
n1=c(30, 31, 30),
m2=c(18,8,22),
sd2=c(1.8, 3.4, 1.8),
n2=c(30,31,30))
# user function to do t-test and return desired values
do.tsum <- function(x) {
# tsum.test is quirky, so you have to break out each column's value
results <- tsum.test(x[1],x[2],x[3],x[4],x[5],x[6],alternative='two.sided')
return(c(results$statistic, results$parameters, results$p.value))
}
# use apply to do the tsum.test on each row (1 for rows, 2 for cols)
# then, transpose the resulting matrix and use the data.frame function
t.results <- data.frame(t(apply, 1, do.tsum))s
# unfortunately the p-value is returned without no column name (it returns 'm1')
# use the names function to change the third column name.
names(t.results)[3] <- 'p.value'
Output is as follows:
t df p.value
1 14.800910 52.78253 1.982944e-20
2 3.091083 57.50678 3.072783e-03
3 22.504396 54.83298 2.277676e-29
Upvotes: 1