Reputation: 358
I noticed an odd behavior in ggplot (unless there's some error I'm not seeing):
set.seed(111)
d = data.frame(x = factor(sample(1:3, size=1000, replace=T)), y = rnorm(1000, 1, .5)^4)
p = ggplot(data=d, aes(x=x, y=y)) +
geom_jitter(alpha=.15, width=.05, size=.75) +
stat_summary(fun.y='median', geom='point', size=2, color='red') +
stat_summary(aes(x=x, y=y), geom='errorbar', fun.ymin=function(z) {quantile(z, .25)}, fun.ymax = function(z) {quantile(z, .75)}, fun.y=median, color='red', width=.2)
p
I want to "zoom in" to see how the groups compare in terms of their IQRs, but then the upper quartiles change:
p + scale_y_continuous(limits=c(0, 5))
Notice the 75 percentile for each group is around 2, but when I compute the actual percentiles, I get values closer to 3:
>aggregate(y~x, data=d, FUN=quantile, .75)
x y
1 1 3.140711
2 2 2.868939
3 3 2.842267
Is this some quirk of ggplot? Or is there an error I'm missing?
Upvotes: 1
Views: 471
Reputation: 377
This is a quirk of ggplot, as you put it. scale_y_continuous
actually filters out those rows of your data frame for which y > 5
. So you're getting the 75th percentile of that subset with y < 5
:
aggregate(y~x, data=subset(d, y<5), FUN=quantile, .75)
x y
1 1 2.075563
2 2 1.709106
3 3 2.059628
To get the zoomed-in plot you want, use coord_cartesian
instead of scale_y_continuous
. In particular this should work:
p + coord_cartesian(ylim = c(0, 5))
The ggplot documentation for coord_cartesian
(http://ggplot2.tidyverse.org/reference/coord_cartesian.html) explains this:
The Cartesian coordinate system is the most familiar, and common, type of coordinate system. Setting limits on the coordinate system will zoom the plot (like you're looking at it with a magnifying glass), and will not change the underlying data like setting limits on a scale will.
Upvotes: 3