Reputation: 150
Consider the following dataframe:
type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)
df = data.frame (type, val1, val2)
I have four categories (called types; A, B, C, and D). The three observations of each type can be averaged to create a type multivariate mean (composed of the means of val1 and val2). I would like to compare all possible combinations of types (AB, AC, AD, BC, BD, CD) using Hotelling's test to determine which type means (if any) are the same. I could hard code this as:
a = filter (df, type == "A") [,2:3]
b = filter (df, type == "B") [,2:3]
c = filter (df, type == "C") [,2:3]
d = filter (df, type == "D") [,2:3]
And then run Hotelling's T2 test for each specified pair of types:
library('Hotelling')
hotelling.test(a, b, shrinkage=FALSE)
hotelling.test(b, c, shrinkage=FALSE)
hotelling.test(a, c, shrinkage=FALSE)
#And so on
This is obviously very inefficient and impractical, given that my actual dataset has 55 different types. I know the answer lies in for loops, but I'm having a hard time figuring out how to tell hotelling.test to compare the val1/val2 multivariate means for all possible type combinations. I'm very new to creating for loops and was hoping someone could point me in the right direction.
After comparing all of the types, I'd then ideally be able to get an output that shows the type pairs for which the Hotelling test p-value was >0.05, meaning that those two types are likely duplicates. In the example dataframe, types A and D return a p-value >0.05, while the other comparisons have p<0.05.
Upvotes: 2
Views: 813
Reputation: 6483
If you want to use for loops:
type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)
df = data.frame (type, val1, val2)
for (first in unique(df$type)) {
for (second in unique(df$type)) {
if (first != second) {
print(c(first, second))
}
}
}
[1] "A" "B"
[1] "A" "C"
[1] "A" "D"
[1] "B" "A"
[1] "B" "C"
[1] "B" "D"
[1] "C" "A"
[1] "C" "B"
[1] "C" "D"
[1] "D" "A"
[1] "D" "B"
[1] "D" "C"
Upvotes: 2
Reputation: 887851
We can use combn
to create the pairwise combinations, subset the dataset and apply the function
library(Hotelling)
outlst <- combn(as.character(unique(df$type)), 2,
FUN = function(x) hotelling.test(subset(df, type == x[1], select = -1),
subset(df, type == x[2], select = -1)), simplify = FALSE)
names(outlst) <- combn(as.character(unique(df$type)), 2, FUN = paste, collapse = "_")
outlst[1]
#$A_B
#Test stat: 36.013
#Numerator df: 2
#Denominator df: 3
#P-value: 0.007996
Upvotes: 2