Reputation: 1752
This might seem like a duplicate of this question, but in fact I want to expand the original question.
I want to annote the boxplot with the number of observations per group AND SUBGROUP in ggplot. Following the example or the original post, here is my minimal example:
require(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median)
My problem is that the number of samples all line up in the center of the group, rather than plotting on the appropriate boxplot (as the picture below shows):
Upvotes: 2
Views: 2287
Reputation: 9886
is it what you want?
require(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median, position=position_dodge(width=0.75))
Upvotes: 3
Reputation: 181
In case anyone else is having trouble positioning the text at suitable locations, here is my modification to the answer from @MLavoie :
require(ggplot2)
give.n <- function(x){
# Calculate the third quantile (q3) and the distance between the median and
# q3:
q3 <- quantile( x, probs = c(0.75), names = F )
distance_between_median_and_q3 <- ( q3 - median(x))
# If the distance between the median and 3rd quartile are large enough, place
# text halfway between the median and 3rd quartile:
if( distance_between_median_and_q3 > 0.8 ){
return( c(
y = median(x) + (q3 - median(x))/2
, label = length(x) ))
} else{
# If the distance is too small, either:
# 1) place text above upper whisker *as long as* IQR = 0,
if(IQR(x) > 0 ){
upper_whisker <- max( x[ x < (q3 + 1.5 * IQR(x)) ])
return( c(
y = upper_whisker * 1.03
, label = length(x) ))
} else{
# or
# 2) place text above median
return( c(
y = median(x) * 1.03
, label = length(x) ))
}
}
}
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary( fun.data = give.n
, geom = "text"
# , fun.y = median
, position = position_dodge( width = 0.75 )
)
Please note that you might have to experiment with some of the values or code in the give.n
function to get it to work with your data. But as you can see, it is possible to make give.n
quite flexible.
Upvotes: 0