R: t-test over all subsets over all columns

Question

This is a follow up question from R: t-test over all columns

Suppose I have a huge data set, and then I created numerous subsets based on certain conditions. The subsets should have the same number of columns. Then I want to do t-test on two subsets at a time (outer loop) and then for each combination of subsets go through all columns one column at a time (inner loop).

Here is what I have come up with based on previous answer. This one stops with an error.

C <- c("c1","c1","c1","c1","c1",
   "c2","c2","c2","c2","c2",
   "c3","c3","c3","c3","c3",
   "c4","c4","c4","c4","c4",
   "c5","c5","c5","c5","c5",
   "c6","c6","c6","c6","c6",
   "c7","c7","c7","c7","c7",
   "c8","c8","c8","c8","c8",
   "c9","c9","c9","c9","c9",
   "c10","c10","c10","c10","c10")
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
Data <- data.frame(C, X, Y, Z)

Data.c1 = subset(Data, C == "c1",select=X:Z)
Data.c2 = subset(Data, C == "c2",select=X:Z)
Data.c3 = subset(Data, C == "c3",select=X:Z)
Data.c4 = subset(Data, C == "c4",select=X:Z)
Data.c5 = subset(Data, C == "c5",select=X:Z)

Data.Subsets = c("Data.c1",
                 "Data.c2",
                 "Data.c3",
                 "Data.c4",
                 "Data.c5") 

library(plyr)

combo1 <- combn(length(Data.Subsets),1)
adply(combo1, 1, function(x) {

  combo2 <- combn(ncol(Data.Subsets[x]),2)
  adply(combo2, 2, function(y) {

      test <- t.test( Data.Subsets[x][, y[1]], Data.Subsets[x][, y[2]], na.rm=TRUE)

      out <- data.frame("Subset" = rownames(Data.Subsets[x]),
                    , "Row" = colnames(x)[y[1]]
                    , "Column" = colnames(x[y[2]])
                    , "t.value" = round(test$statistic,3)
                    ,  "df"= test$parameter
                    ,  "p.value" = round(test$p.value, 3)
                    )
      return(out)
  } )
} )

Richie Cotton · Accepted Answer

First of all, you can more easily define you dataset using gl, and by avoiding creating individual variables for the columns.

Data <- data.frame(
  C = gl(10, 5, labels = paste("c", 1:10, sep = "")),
  X = rnorm(n = 50, mean = 10, sd = 5),
  Y = rnorm(n = 50, mean = 15, sd = 6),
  Z = rnorm(n = 50, mean = 20, sd = 5)
)

Convert this to "long" format using melt from the reshape package. (You can also use the base reshape function.)

longData <- melt(Data, id.vars = "C")

Now Use pairwise.t.test to compute t tests on all pairs of X/Y/Z for for each level of C.

with(longData, pairwise.t.test(value, interaction(C, variable)))

Note that it is important to use pairwise.t.test rather than just lots of individual calls to t.test because you need to adjust your p values if you run lots of tests. (See, e.g., xkcd for explanation.)

In general, pairwise t tests are inferior to a regression so be careful about their usage.

R: t-test over all subsets over all columns

Answers (2)

Related Questions