R: Run multiple post hoc tests at once, using emmeans package

Question

I'm working on a dataset with several different types of proteins as columns. It kinds of looks like this This is simplified, the original dataset contains over 100 types of proteins. I wanted to see if the concentration of a protein differs by treatments when taking random effect (=id) into consideration. I managed to run multiple repeated ANOVA at once. But I would also like to do pairwise comparisons for all proteins based on the treatment. The first thing came to my mind was using emmeans package, but I had trouble coding this.

#install packages 
library(tidyverse)
library(emmeans)

#Create a data set
set.seed(1)
id <- rep(c("1","2","3","4","5","6"),3)
Treatment <- c(rep(c("A"), 6), rep(c("B"), 6),rep(c("C"), 6))
Protein1 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
Protein2 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
Protein3 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))

DF <- data.frame(id, Treatment, Protein1, Protein2, Protein3) %>%
      mutate(id = factor(id),
      Treatment = factor(Treatment, levels = c("A","B","C")))

#First, I tried to run multiple anova, by using lapply
responseList <- names(DF)[c(3:5)]

modelList    <- lapply(responseList, function(resp) {
mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
aov(mF, data = DF)
})

lapply(modelList, summary)

#Pairwise comparison using emmeans. This did not work
wt_emm <- emmeans(modelList, "Treatment") 

> wt_emm <- emmeans(modelList, "Treatment")
Error in ref_grid(object, ...) : Can't handle an object of class  “list” 
 Use help("models", package = "emmeans") for information on supported models.

So I tried a different approach

anova2 <- aov(cbind(Protein1,Protein2,Protein3)~ Treatment +Error(id/Treatment), data = DF)
summary(anova2)

#Pairwise comparison using emmeans. 
#I got only result for the whole dataset, instead of by different types of protein.
wt_emm2 <- emmeans(anova2, "Treatment")
pairs(wt_emm2)

> pairs(wt_emm2)
 contrast estimate   SE df t.ratio p.value
 A - B      -1.704 1.05 10 -1.630  0.2782 
 A - C       0.865 1.05 10  0.827  0.6955 
 B - C       2.569 1.05 10  2.458  0.0793

I don't understand why even if I used "cbind(Protein1, Protein2, Protein3)" in the anova model. R still only gives me one result instead of something like the following

this is what I was hoping to get
 > Protein1
     contrast 
     A - B      
     A - C      
     B - C       
> Protein2
     contrast 
     A - B      
     A - C      
     B - C
> Protein3
     contrast 
     A - B      
     A - C      
     B - C

How do I code this or should I try a different package/function?

I don't have trouble running one protein at a time. However, since I have over 100 proteins to run, it would be really time-consuming to code them one by one.

Any suggestion is appreciated. Thank you!

rawr · Accepted Answer

Here

#Pairwise comparison using emmeans. This did not work
wt_emm <- emmeans(modelList, "Treatment")

you need to lapply over the list like you did with lapply(modelList, summary)

modelList  <- lapply(responseList, function(resp) {
  mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
  aov(mF, data = DF)
})

But when you do this, there is an error:

lapply(modelList, function(x) pairs(emmeans(x, "Treatment")))

Note: re-fitting model with sum-to-zero contrasts Error in terms(formula, "Error", data = data) : object 'mF' not found

attr(modelList[[1]], 'call')$formula
# mF

Note that mF was the name of the formula object, so it seems emmeans needs the original formula for some reason. You can add the formula to the call:

modelList  <- lapply(responseList, function(resp) {
  mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
  av <- aov(mF, data = DF)
  attr(av, 'call')$formula <- mF
  av
})

lapply(modelList, function(x) pairs(emmeans(x, "Treatment")))

# [[1]]
# contrast estimate   SE df t.ratio p.value
#   A - B       -1.89 1.26 10 -1.501  0.3311 
#   A - C        1.08 1.26 10  0.854  0.6795 
#   B - C        2.97 1.26 10  2.356  0.0934 
# 
# P value adjustment: tukey method for comparing a family of 3 estimates 
# 
# [[2]]
# contrast estimate   SE df t.ratio p.value
#   A - B       -1.44 1.12 10 -1.282  0.4361 
#   A - C        1.29 1.12 10  1.148  0.5082 
#   B - C        2.73 1.12 10  2.430  0.0829 
# 
# P value adjustment: tukey method for comparing a family of 3 estimates 
# 
# [[3]]
# contrast estimate   SE df t.ratio p.value
#   A - B       -1.58 1.15 10 -1.374  0.3897 
#   A - C        1.27 1.15 10  1.106  0.5321 
#   B - C        2.85 1.15 10  2.480  0.0765 
# 
# P value adjustment: tukey method for comparing a family of 3 estimates

R: Run multiple post hoc tests at once, using emmeans package

Answers (2)

Related Questions