Alex
Alex

Reputation: 371

Tukeys post-hoc on ggplot boxplot

Ok, so I think I'm pretty close with this, but I'm getting an error when I try to construct my box plot at the end. My goal is to place letters denoting statistical relationships among the time points above each boxplot. I've seen two discussion of this on this site, and can reproduce the results from their code, but can't apply it to my dataset.

Packages

library(ggplot2)
library(multcompView)
library(plyr)

Here is my data:

dput(WaterConDryMass)
structure(list(ChillTime = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("Pre_chill", 
"6", "13", "24", "Post_chill"), class = "factor"), dmass = c(0.22, 
0.19, 0.34, 0.12, 0.23, 0.33, 0.38, 0.15, 0.31, 0.34, 0.45, 0.48, 
0.59, 0.54, 0.73, 0.69, 0.53, 0.57, 0.39, 0.8)), .Names = c("ChillTime", 
"dmass"), row.names = c(NA, -20L), class = "data.frame")

ANOVA and Tukey Post-hoc

Model4 <- aov(dmass~ChillTime, data=WaterConDryMass)
tHSD <- TukeyHSD(Model4, ordered = FALSE, conf.level = 0.95)
plot(tHSD , las=1 , col="brown" )

Function:

generate_label_df <- function(TUKEY, flev){

  # Extract labels and factor levels from Tukey post-hoc 
  Tukey.levels <- TUKEY[[flev]][,4]
  Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
  plot.labels <- names(Tukey.labels[['Letters']])

  boxplot.df <- ddply(WaterConDryMass, flev, function (x) max(fivenum(x$y)) + 0.2)

  # Create a data frame out of the factor levels and Tukey's homogenous group letters
  plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],
                            stringsAsFactors = FALSE) 

  # Merge it with the labels
  labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)
  return(labels.df)
}  

Boxplot:

ggplot(WaterConDryMass, aes(x = ChillTime, y = dmass)) +
  geom_blank() +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  labs(x = 'Time (weeks)', y = 'Water Content (DM %)') +
  ggtitle(expression(atop(bold("Water Content"), atop(italic("(Dry Mass)"), "")))) +
  theme(plot.title = element_text(hjust = 0.5, face='bold')) +
  annotate(geom = "rect", xmin = 1.5, xmax = 4.5, ymin = -Inf, ymax = Inf, alpha = 0.6, fill = "grey90") +
  geom_boxplot(fill = 'green2', stat = "boxplot") +
  geom_text(data = generate_label_df(tHSD), aes(x = plot.labels, y = V1, label = labels)) +
  geom_vline(aes(xintercept=4.5), linetype="dashed") +
  theme(plot.title = element_text(vjust=-0.6))

Error:

Error in HSD[[flev]] : invalid subscript type 'symbol'

Upvotes: 2

Views: 19291

Answers (1)

J.Con
J.Con

Reputation: 4309

I think I found the tutorial you are following, or something very similar. You would probably be best to copy and paste this whole thing into your work space, function and all, to avoid missing a few small differences.

Basically I have followed the tutorial (http://www.r-graph-gallery.com/84-tukey-test/) to the letter and added a few necessary tweaks at the end. It adds a few extra lines of code, but it works.

generate_label_df <- function(TUKEY, variable){

  # Extract labels and factor levels from Tukey post-hoc 
  Tukey.levels <- TUKEY[[variable]][,4]
  Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])

  #I need to put the labels in the same order as in the boxplot :
  Tukey.labels$treatment=rownames(Tukey.labels)
  Tukey.labels=Tukey.labels[order(Tukey.labels$treatment) , ]
  return(Tukey.labels)
}

model=lm(WaterConDryMass$dmass~WaterConDryMass$ChillTime )
ANOVA=aov(model)

# Tukey test to study each pair of treatment :
TUKEY <- TukeyHSD(x=ANOVA, 'WaterConDryMass$ChillTime', conf.level=0.95)

labels<-generate_label_df(TUKEY , "WaterConDryMass$ChillTime")#generate labels using function

names(labels)<-c('Letters','ChillTime')#rename columns for merging

yvalue<-aggregate(.~ChillTime, data=WaterConDryMass, mean)# obtain letter position for y axis using means

final<-merge(labels,yvalue) #merge dataframes

ggplot(WaterConDryMass, aes(x = ChillTime, y = dmass)) +
  geom_blank() +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  labs(x = 'Time (weeks)', y = 'Water Content (DM %)') +
  ggtitle(expression(atop(bold("Water Content"), atop(italic("(Dry Mass)"), "")))) +
  theme(plot.title = element_text(hjust = 0.5, face='bold')) +
  annotate(geom = "rect", xmin = 1.5, xmax = 4.5, ymin = -Inf, ymax = Inf, alpha = 0.6, fill = "grey90") +
  geom_boxplot(fill = 'green2', stat = "boxplot") +
  geom_text(data = final, aes(x = ChillTime, y = dmass, label = Letters),vjust=-3.5,hjust=-.5) +
  geom_vline(aes(xintercept=4.5), linetype="dashed") +
  theme(plot.title = element_text(vjust=-0.6))

enter image description here

Upvotes: 4

Related Questions