David
David

Reputation: 33

Label a Boxplot with number of subgroups and observations per box

I want to make a boxplot where I label each box with both the number of observations that relate to that box, and the number of subgroups which relate to that box

I can get close to what I want with the following code using the diamonds data set included in the ggplot2 package

data("diamonds")
n_fun <- function(x){
  return(data.frame(y = 1,
                    label = length(x)))
}
ggplot(diamonds, aes(x=cut, y=price, fill=clarity)) +
  geom_boxplot(position = position_dodge2(width=0.75, preserve='single')) + 
  theme_bw() + 
  stat_summary(fun.data = n_fun, geom = "text",aes(group=clarity),hjust = 0.5, position = position_dodge(0.6))

This gives me a plot where it displays the number of observations for each "box" What I'd like to do is both display the number of observations and also display the number of colors in each box, for example

Fair_I1<-subset(diamonds, cut=="Fair" & clarity=="I1")
table(Fair_I1$color)

Shows that there are 7 color groups present in the box relating to Fair-I1

So the final example would show both 7 (the number of colors) and 210 (the number of observations) under or over this box in the plot

Upvotes: 1

Views: 808

Answers (1)

Iroha
Iroha

Reputation: 34751

You can summarise the data beforehand and pass the summary data to geom_text(). Here I collapsed the values into a single label but you could do them independently and place separate layers if you wanted one set of numbers at the top and the other at the bottom for example.

library(ggplot2)
library(dplyr)

labeldat <- diamonds %>%
  group_by(cut, clarity) %>%
  summarise(labels = paste(n(), n_distinct(color), sep = "\n"))

ggplot(diamonds, aes(x=cut, y=price, fill=clarity)) +
  geom_boxplot(position = position_dodge2(width=0.75)) + 
  theme_bw() + 
  geom_text(data = labeldat, aes(x = cut, y = -250, label = labels), hjust = 0.5, position = position_dodge2(width = .75))

enter image description here

Upvotes: 2

Related Questions