Reputation: 3407
TL;DR edition
I have vectors X1,X2,X3,...Xn. I want to test to see whether the average value for any one vector is significantly different than the average value for any other vector, for every possible combination of vectors. I am seeking a better way to do this in R than running n^2 individual t.tests.
Full Story
I have a data frame full of census data for a particular CSA. Each row contains observations for each variable (column) for a particular census tract.
What I need to do is compare means for the same variable across census tracts in different MSAs. In other words, I want to factor my data.frame according to the MSA designation variable (which is one of the columns) and then compare the differences in the means for another variable of interest pairwise across each newly-factored MSA. This is essentially doing pairwise t.tests across each ensuing vector, but I wish to do this in a more elegant way than writing t.test(MSAx, MSAy) over and over again. How can I do this?
Upvotes: 5
Views: 17618
Reputation: 101
In addition to response from quarzgar, there are another method to perform pairwise ttest across multiple factors in R. Basically is a trick for the two (or more) factors used by creating a combination of factor levels.
Example with a 2x2 classical design:
df <- data.frame(Id=c(rep(1:100,2),rep(101:200,2)),
dv=c(rnorm(100,10,5),rnorm(100,20,7),rnorm(100,11,5),rnorm(100,12,6)),
Group=c(rep("Experimental",200),rep("Control",200)),
Condition=rep(c(rep("Pre",100),rep("Post",100)),2))
#ANOVA
summary(aov(dv~Group*Condition+Error(Id/Condition),data = df))
#post-hoc across all factors
df$posthoclevels <- paste(df$Group,df$Condition) #factor combination
pairwise.t.test(df$dv,df$posthoclevels)
# Pairwise comparisons using t tests with pooled SD
#
# data: df$dv and df$posthoclevels
#
# Control Post Control Pre Experimental Post
# Control Pre 0.60 - -
# Experimental Post <2e-16 <2e-16 -
# Experimental Pre 0.26 0.47 <2e-16
#
# P value adjustment method: holm
Upvotes: 0
Reputation: 4632
Just use pairwise.t.test
, here is an example:
x1 <- rnorm(50)
x2 <- rnorm(30, mean=0.2)
x3 <- rnorm(100,mean=0.1)
x4 <- rnorm(100,mean=0.4)
x <- data.frame(data=c(x1,x2,x3,x4),
key=c(
rep("x1", length(x1)),
rep("x2", length(x2)),
rep("x3", length(x3)),
rep("x4", length(x4))) )
pairwise.t.test(x$data,
x$key,
pool.sd=FALSE)
# Pairwise comparisons using t tests with non-pooled SD
#
# data: x$data and x$key
#
# x1 x2 x3
# x2 0.7395 - -
# x3 0.9633 0.9633 -
# x4 0.0067 0.9633 0.0121
#
# P value adjustment method: holm
Upvotes: 9
Reputation: 3037
The advantage to my method below to the one proposed by @ashkan would be that mine removes duplicates. (i.e. either X1 vs X2 OR X2 vs X1 will appear in the results, not both)
# Generate dummy data
df <- data.frame(matrix(rnorm(100), ncol = 10))
colnames(df) <- paste0("X", 1:10)
# Create combinations of the variables
combinations <- combn(colnames(df),2, simplify = FALSE)
# Do the t.test
results <- lapply(seq_along(combinations), function (n) {
df <- df[,colnames(df) %in% unlist(combinations[n])]
result <- t.test(df[,1], df[,2])
return(result)})
# Rename list for legibility
names(results) <- paste(matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,1], matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,2], sep = " vs. ")
Upvotes: 8
Reputation: 4930
If you have a data.frame and you wish to independently perform T-tests between each column of the data.frame, you can use a double apply loop:
apply(MSA, 2, function(x1) {
apply(MSA, 2, function(x2) {
t.test(x1, x2)
})
})
A good visualization to accompany such a brute force approach would be a forest plot:
cis <- apply(MSA, 2, function(x) mean(x) + c(-1, 1) * sd(x) * 1.96)
plot.new()
plot.window(xlim=c(1, ncol(cis)), ylim=range(cis))
segments(1:ncol(cis), cis[1, ], 1:ncol(cis), cis[2, ])
axis(1, at=1:ncol(cis), labels=colnames(MSA))
axis(2)
box()
abline(h=mean(MSA), lty='dashed')
title('Forest plot of 95% confidence intervals of MSA')
Upvotes: 4