Jeroen
Jeroen

Reputation: 544

How to perform the same t-test in a for loop?

I have a database with columns theme (value 0 or 1), level (value 1 to 9) and startTime (double value). For every level, I want to perform a t-test on the startTime values. Here is my code:

database <- read.csv("database.csv")
themeData <- database[database$theme == 1, ]
noThemeData <- database[database$theme == 0, ]

for (i in 1:9) {
  x <- themeData[themeData$level == i, ]
  y <- noThemeData[noThemeData$level == i, ]
  t.test(x$startTime,y$startTime,
         alternative = "less")
}

Unfortunately, no t-tests are being executed. In the end, x and y simply get the value for i=9. What am I doing wrong?

Upvotes: 1

Views: 2302

Answers (1)

r2evans
r2evans

Reputation: 161085

Your code is doing busy work: it is doing the calculations of the t.test, but since for loops always discard their implied results, you aren't storing it anywhere. You would have had to use a vector or list (pre-allocated is always better) like so:

res <- replicate(9, NULL)
for (i in 1:9) {
  x <- themeData[themeData$level == i, ]
  y <- noThemeData[noThemeData$level == i, ]
  res[[i]] <- t.test(x$startTime,y$startTime,
                     alternative = "less")
}
res[[2]]

This can be "good enough" in that it is saving all test "results objects" in a list for later processing/consumption. A slightly better method is to use one of the *apply functions; the first two I think of that are directly applicable here (lapply, sapply(..., simplify=FALSE)) have various minor advantages, frankly you can choose either.

res <- lapply(c(4, 6, 8), function(thiscyl) {
  am0 <- subset(mtcars, am == 0 & cyl == thiscyl)
  am1 <- subset(mtcars, am == 1 & cyl == thiscyl)
  t.test(am0$mpg, am1$mpg)
})

This is especially beneficial if (unlike here) the tests take a long time: you perform the test and preserve the models, so you can so lots of things to the results without having to rerun the tests. For instance, if you wanted just the p-values:

sapply(res, `[`, "p.value")
# $p.value
# [1] 0.01801712
# $p.value
# [1] 0.187123
# $p.value
# [1] 0.7038727

or more tersely:

sapply(res, `[[`, "p.value")
# [1] 0.01801712 0.18712303 0.70387268

Another example, the confidence intervals, in a matrix:

t(sapply(res, `[[`, "conf.int"))
#           [,1]      [,2]
# [1,] -9.232108 -1.117892
# [2,] -3.916068  1.032735
# [3,] -2.339549  1.639549

You can always look at a single model with, say, res[[2]], but if you need to see all of them you can use just res and see the whole gamut.

res[[2]]
#   Welch Two Sample t-test
# data:  am0$mpg and am1$mpg
# t = -1.5606, df = 4.4055, p-value = 0.1871
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#  -3.916068  1.032735
# sample estimates:
# mean of x mean of y 
#  19.12500  20.56667 

Upvotes: 4

Related Questions