TAH
TAH

Reputation: 150

How to loop through all possible factor level comparisons in R

Consider the following dataframe:

type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)

df = data.frame (type, val1, val2)

I have four categories (called types; A, B, C, and D). The three observations of each type can be averaged to create a type multivariate mean (composed of the means of val1 and val2). I would like to compare all possible combinations of types (AB, AC, AD, BC, BD, CD) using Hotelling's test to determine which type means (if any) are the same. I could hard code this as:

a = filter (df, type == "A") [,2:3]
b = filter (df, type == "B") [,2:3]
c = filter (df, type == "C") [,2:3]
d = filter (df, type == "D") [,2:3]

And then run Hotelling's T2 test for each specified pair of types:

library('Hotelling')
hotelling.test(a, b, shrinkage=FALSE)
hotelling.test(b, c, shrinkage=FALSE)
hotelling.test(a, c, shrinkage=FALSE)

#And so on

This is obviously very inefficient and impractical, given that my actual dataset has 55 different types. I know the answer lies in for loops, but I'm having a hard time figuring out how to tell hotelling.test to compare the val1/val2 multivariate means for all possible type combinations. I'm very new to creating for loops and was hoping someone could point me in the right direction.

After comparing all of the types, I'd then ideally be able to get an output that shows the type pairs for which the Hotelling test p-value was >0.05, meaning that those two types are likely duplicates. In the example dataframe, types A and D return a p-value >0.05, while the other comparisons have p<0.05.

Upvotes: 2

Views: 813

Answers (2)

dario
dario

Reputation: 6483

If you want to use for loops:

type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)

df = data.frame (type, val1, val2)

for (first in unique(df$type)) {
  for (second in unique(df$type)) {
    if (first != second) {
      print(c(first, second))
    }
  }
}

[1] "A" "B"
[1] "A" "C"
[1] "A" "D"
[1] "B" "A"
[1] "B" "C"
[1] "B" "D"
[1] "C" "A"
[1] "C" "B"
[1] "C" "D"
[1] "D" "A"
[1] "D" "B"
[1] "D" "C"

Upvotes: 2

akrun
akrun

Reputation: 887851

We can use combn to create the pairwise combinations, subset the dataset and apply the function

library(Hotelling)
outlst <- combn(as.character(unique(df$type)), 2, 
    FUN = function(x) hotelling.test(subset(df, type == x[1], select = -1), 
          subset(df, type == x[2], select = -1)), simplify = FALSE)
names(outlst) <- combn(as.character(unique(df$type)), 2, FUN = paste, collapse = "_")

outlst[1]
#$A_B
#Test stat:  36.013 
#Numerator df:  2 
#Denominator df:  3 
#P-value:  0.007996 

Upvotes: 2

Related Questions