Reputation: 992
This is a follow up question from R: t-test over all columns
Suppose I have a huge data set, and then I created numerous subsets based on certain conditions. The subsets should have the same number of columns. Then I want to do t-test on two subsets at a time (outer loop) and then for each combination of subsets go through all columns one column at a time (inner loop).
Here is what I have come up with based on previous answer. This one stops with an error.
C <- c("c1","c1","c1","c1","c1",
"c2","c2","c2","c2","c2",
"c3","c3","c3","c3","c3",
"c4","c4","c4","c4","c4",
"c5","c5","c5","c5","c5",
"c6","c6","c6","c6","c6",
"c7","c7","c7","c7","c7",
"c8","c8","c8","c8","c8",
"c9","c9","c9","c9","c9",
"c10","c10","c10","c10","c10")
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
Data <- data.frame(C, X, Y, Z)
Data.c1 = subset(Data, C == "c1",select=X:Z)
Data.c2 = subset(Data, C == "c2",select=X:Z)
Data.c3 = subset(Data, C == "c3",select=X:Z)
Data.c4 = subset(Data, C == "c4",select=X:Z)
Data.c5 = subset(Data, C == "c5",select=X:Z)
Data.Subsets = c("Data.c1",
"Data.c2",
"Data.c3",
"Data.c4",
"Data.c5")
library(plyr)
combo1 <- combn(length(Data.Subsets),1)
adply(combo1, 1, function(x) {
combo2 <- combn(ncol(Data.Subsets[x]),2)
adply(combo2, 2, function(y) {
test <- t.test( Data.Subsets[x][, y[1]], Data.Subsets[x][, y[2]], na.rm=TRUE)
out <- data.frame("Subset" = rownames(Data.Subsets[x]),
, "Row" = colnames(x)[y[1]]
, "Column" = colnames(x[y[2]])
, "t.value" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
} )
} )
Upvotes: 1
Views: 3988
Reputation: 4939
You can use get(Data.subset[x])
which will pick out the relevant data frame. But I don't think this should be necessary.
Explicitly subsetting that many times shoudn't be necessry either. You could create them using something like
conditions = c("c1", "c2", "c3", "c4", "c5")
dfs <- lapply(conditions, function(x){subset(Data, C==x, select=X:Z)})
That should (didn't test it) return a list of data frames each subseted on the various conditions you passed it.
However it would be a much better idea as @Richie Cotton points out, to reshape your data frame and use pairwise t tests.
I should point out that doing this many t-tests doesn't seem wise. Even after correction for multiple testing, be it FDR, permutation or otherwise. It would be better to try and figure out if you can use an anova of some sort as they are used for almost exactly this purpose.
Upvotes: 1
Reputation: 121077
First of all, you can more easily define you dataset using gl
, and by avoiding creating individual variables for the columns.
Data <- data.frame(
C = gl(10, 5, labels = paste("c", 1:10, sep = "")),
X = rnorm(n = 50, mean = 10, sd = 5),
Y = rnorm(n = 50, mean = 15, sd = 6),
Z = rnorm(n = 50, mean = 20, sd = 5)
)
Convert this to "long" format using melt
from the reshape
package. (You can also use the base reshape
function.)
longData <- melt(Data, id.vars = "C")
Now Use pairwise.t.test
to compute t tests on all pairs of X/Y/Z for for each level of C.
with(longData, pairwise.t.test(value, interaction(C, variable)))
Note that it is important to use pairwise.t.test
rather than just lots of individual calls to t.test
because you need to adjust your p values if you run lots of tests. (See, e.g., xkcd for explanation.)
In general, pairwise t tests are inferior to a regression so be careful about their usage.
Upvotes: 6