Gabby
Gabby

Reputation: 21

How can I use geom_text to add text at top of each box in a box plot (ggplot2)?

I am attempting to create a box plot with text annotation at the top of each box (some boxes are not tall enough for the legend to represent them) but I end up having the text at the top and bottom of each box.

enter image description here

R version 4.0.0 Windows 7 Here is a sample data and code.

library(ggplot2)
library(ggthemes)
set.seed(42)  ## for sake of reproducibility
n <- 24
dat2 <- data.frame( 
                  Treatment=rep(c("T1", "T2", "T3", "T4", "T5", "T6"), 2),
                  Time=rep(c("Initial", "Final"), n/2),
                  Count=sample(20:300, n))
require(plyr)
require(magrittr)
dat2<-as.data.frame(dat)
dat2=
mutate(dat2,
Time=factor(Time, levels=unique(Time)))
p2.dat<-ggplot(data = dat2, aes(x = Treatment, y = Count)) + 
  geom_boxplot(aes(fill = Time), width = 0.8, position = position_dodge(width = 1))
p2.dat + geom_text(aes(label = paste(Time)), position = position_dodge(width = 1), vjust = -0.5, size = 4, stat = "unique", parse = TRUE)

Any help will be appreciated.

Upvotes: 1

Views: 3681

Answers (1)

det
det

Reputation: 5232

You can summarise your data for geom_text to have maximum value of Count:

ggplot(dat2, aes(x = Treatment)) + 
  geom_boxplot(
    aes(y = Count, fill = Time), 
    width = 0.8, 
    position = position_dodge2(width = 1)
  ) +  
  geom_text(
    data = dat2 %>% group_by(Treatment, Time) %>% summarise(y = max(Count), Time = first(Time)),
    aes(y = y,label = Time), 
    position = position_dodge(width = 1), 
    vjust = -0.5, 
    size = 4, 
    stat = "unique", 
    parse = TRUE
  )

This will plot text above maximum value (which can be 'outlier').

enter image description here

If you want to put text above end of upper whisker replace data in geom_text with:

dat2 %>% group_by(Treatment, Time) %>% summarise(
  y = min(max(Count), diff(quantile(Count, probs = c(0.25, 0.75)) * c(3/2, 5/2))), 
  Time = first(Time)
)

In this example both methods will work because there are no 'outliers' in data.

Upvotes: 2

Related Questions