Rachel
Rachel

Reputation: 123

Label boxes in ggplot2 boxplot

I would like a label to appear above each box in a boxplot generated by ggplot2.

For example:

#Example data
test = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B")
patient = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3)
result =  c(5, 7, 2 ,4, 6, 7, 3, 5, 5, 6, 2 ,3)
data <- tibble(test, patient, result)

#Labels I want to include
Alabs = c(1, 3, 500)
Blabs = c(8, 16, -32)

#Plot data
ggplot(data, aes(x = factor(patient), y = result, color = factor(test))) + 
  geom_boxplot(outlier.shape = 1)

Gives the plot:

enter image description here

I would like to print the first element of Alabs above the red box for the first patient, the second element of Alabs above the red box for the second patient, the first element of Blabs above the blue box for the first patient, etc.

How do I do this?

Upvotes: 5

Views: 15898

Answers (2)

aosmith
aosmith

Reputation: 36076

I would make a separate labels dataset for adding the labels.

labs = tibble(test = rep(LETTERS[1:2], each = 3),
                  patient = c(1, 2, 3, 1, 2, 3),
                  labels = c(1, 3, 500, 8, 16, -32) )

   test patient labels
  <chr>   <dbl>  <dbl>
1     A       1      1
2     A       2      3
3     A       3    500
4     B       1      8
5     B       2     16
6     B       3    -32

The above contains all the information about the x axis and the faceting variable. What it's missing is info about the location of the text on y axis. To put these above the boxes we could calculate the max for each factor combinations plus a small value for the y position (while geom_text has a useful nudge_y argument, it doesn't work while dodging).

I make the summaries per group via dplyr, and then join the y position values to the labels dataset.

library(dplyr)

labeldat = data %>%
     group_by(test, patient) %>%
     summarize(ypos = max(result) + .25 ) %>%
     inner_join(., labs)

Now you can add the geom_text layer, using the dataset of labels. To dodge these the same way as the boxplots, using position_dodge. To keep letters from showing up in the legend I use show.legend = FALSE.

ggplot(data, aes(x = factor(patient), y = result, color = test)) + 
     geom_boxplot(outlier.shape = 1) +
     geom_text(data = labeldat, aes(label = labels, y = ypos), 
               position = position_dodge(width = .75), 
               show.legend = FALSE )

enter image description here

Upvotes: 4

Matt
Matt

Reputation: 994

Takes some cheating to get the labels into the same tibble:

data$labs=c(NA, 1, NA, 3, NA, 500, NA, 8, NA, 16, NA, -32) #line up the labels so each patient gets one: if you put the NAs first, labels will be at the bottom of the boxes
data$lab_x=c(NA, 0.75, NA, 1.75, NA, 2.75, NA, 1.25, NA, 2.25, NA, 3.25) #set x position for each one

Then run ggplot:

 ggplot(data, aes(x = factor(patient), y = result, color = factor(test))) + 
   geom_boxplot(outlier.shape = 1)+
   geom_text(aes(label=labs, x=lab_x))

enter image description here

Upvotes: 1

Related Questions