R: Doing t-test between pairs of factors

Question

I have an R data frame with a factor variable with 8 levels (ordered). I want to do a t-test between level 1 & 2, 3 & 4, 5 & 6 and 7 & 8. While I can subset the data to extract each pair of categories, I am wondering if there is a easier way to do it. Can't figure out. Tried the following, but it complains about differing lengths (each level has different number of observations):

t.test(var1 ~ levels(factorvar)[1:2], data = mydf)

eipi10 · Accepted Answer

I think the error is probably because levels(factorvar)[1:2] returns just two values "1" and "2", but t.test expects the length of the vectors on both sides of the ~ to be the same. In other words, it's not an issue of having different numbers of observations in each factor level. Rather, if, for example, you have 40 values of var1 for factorvar=1 and 50 values of var1 for factorvar=2, then you need a vector of length 90 on both sides of the ~.

Try this instead:

t.test(var1 ~ factorvar, data=mydf[mydf$factorvar %in% c(1,2),])

You can also create a function so that you don't have to type all that code for each combination of factors:

# Function to return p-values from t-test between two factor levels
my.t = function(fac1, fac2){
  t.test(mydf$var1[mydf$factorvar==fac1], 
         mydf$var1[mydf$factorvar==fac2])$p.value
}

# Run the function on factor levels 1 and 2
my.t(1,2)

# Do all four at once
mapply(my.t, seq(1,7,2), seq(2,8,2))

If you want to return the entire output of the t-test for each pair of factor levels (rather than just the p-values), then remove the $p.value from the function above and run mapply with SIMPLIFY=FALSE added.

This is a coding site, rather than a statistical advice site, but also beware of multiple comparisons.

R: Doing t-test between pairs of factors

Answers (2)

Related Questions