Reputation: 544
I have a database with columns theme (value 0 or 1), level (value 1 to 9) and startTime (double value). For every level, I want to perform a t-test on the startTime values. Here is my code:
database <- read.csv("database.csv")
themeData <- database[database$theme == 1, ]
noThemeData <- database[database$theme == 0, ]
for (i in 1:9) {
x <- themeData[themeData$level == i, ]
y <- noThemeData[noThemeData$level == i, ]
t.test(x$startTime,y$startTime,
alternative = "less")
}
Unfortunately, no t-tests are being executed. In the end, x and y simply get the value for i=9. What am I doing wrong?
Upvotes: 1
Views: 2302
Reputation: 161085
Your code is doing busy work: it is doing the calculations of the t.test
, but since for
loops always discard their implied results, you aren't storing it anywhere. You would have had to use a vector or list (pre-allocated is always better) like so:
res <- replicate(9, NULL)
for (i in 1:9) {
x <- themeData[themeData$level == i, ]
y <- noThemeData[noThemeData$level == i, ]
res[[i]] <- t.test(x$startTime,y$startTime,
alternative = "less")
}
res[[2]]
This can be "good enough" in that it is saving all test "results objects" in a list
for later processing/consumption. A slightly better method is to use one of the *apply
functions; the first two I think of that are directly applicable here (lapply
, sapply(..., simplify=FALSE)
) have various minor advantages, frankly you can choose either.
res <- lapply(c(4, 6, 8), function(thiscyl) {
am0 <- subset(mtcars, am == 0 & cyl == thiscyl)
am1 <- subset(mtcars, am == 1 & cyl == thiscyl)
t.test(am0$mpg, am1$mpg)
})
This is especially beneficial if (unlike here) the tests take a long time: you perform the test and preserve the models, so you can so lots of things to the results without having to rerun the tests. For instance, if you wanted just the p-values:
sapply(res, `[`, "p.value")
# $p.value
# [1] 0.01801712
# $p.value
# [1] 0.187123
# $p.value
# [1] 0.7038727
or more tersely:
sapply(res, `[[`, "p.value")
# [1] 0.01801712 0.18712303 0.70387268
Another example, the confidence intervals, in a matrix:
t(sapply(res, `[[`, "conf.int"))
# [,1] [,2]
# [1,] -9.232108 -1.117892
# [2,] -3.916068 1.032735
# [3,] -2.339549 1.639549
You can always look at a single model with, say, res[[2]]
, but if you need to see all of them you can use just res
and see the whole gamut.
res[[2]]
# Welch Two Sample t-test
# data: am0$mpg and am1$mpg
# t = -1.5606, df = 4.4055, p-value = 0.1871
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -3.916068 1.032735
# sample estimates:
# mean of x mean of y
# 19.12500 20.56667
Upvotes: 4