Ratnanil
Ratnanil

Reputation: 1752

Add a number of observations per group AND SUBGROUP in ggplot2 boxplot

This might seem like a duplicate of this question, but in fact I want to expand the original question.

I want to annote the boxplot with the number of observations per group AND SUBGROUP in ggplot. Following the example or the original post, here is my minimal example:

require(ggplot2)

give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
  # experiment with the multiplier to find the perfect position
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
  geom_boxplot() +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median)

My problem is that the number of samples all line up in the center of the group, rather than plotting on the appropriate boxplot (as the picture below shows):Annotes are centering in the middle of the group rather than plotting on the appropriate boxplot

Upvotes: 2

Views: 2287

Answers (2)

MLavoie
MLavoie

Reputation: 9886

is it what you want?

require(ggplot2)

give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
  # experiment with the multiplier to find the perfect position
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
  geom_boxplot() +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median, position=position_dodge(width=0.75))

enter image description here

Upvotes: 3

E. Nygaard
E. Nygaard

Reputation: 181

In case anyone else is having trouble positioning the text at suitable locations, here is my modification to the answer from @MLavoie :

require(ggplot2)

give.n <- function(x){
  
  # Calculate the third quantile (q3) and the distance between the median and
  # q3:
  q3 <- quantile( x, probs = c(0.75), names = F )
  distance_between_median_and_q3 <- ( q3 - median(x))
  
  # If the distance between the median and 3rd quartile are large enough, place
  # text halfway between the median and 3rd quartile:
  if( distance_between_median_and_q3 > 0.8 ){
    return( c( 
      y = median(x) + (q3 - median(x))/2
      , label = length(x) )) 
  } else{
    # If the distance is too small, either:
    
    # 1) place text above upper whisker *as long as*  IQR = 0,
    if(IQR(x) > 0 ){
      upper_whisker <- max( x[ x < (q3 + 1.5 * IQR(x)) ])
      
      return( c( 
        y = upper_whisker * 1.03
        , label = length(x) )) 
    } else{
      # or 
      # 2) place text above median
      return( c( 
        y = median(x) * 1.03
        , label = length(x) )) 
    }
  }
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
  geom_boxplot() +
  stat_summary( fun.data = give.n
                , geom = "text"
                # , fun.y = median
                , position = position_dodge( width = 0.75 ) 
  )

Please note that you might have to experiment with some of the values or code in the give.n function to get it to work with your data. But as you can see, it is possible to make give.n quite flexible.

enter image description here

Upvotes: 0

Related Questions