Boris Gorelik
Boris Gorelik

Reputation: 31777

Very strange graphs with ggplot2 geom_violin

I could only reliably reconstruct this problem with a pretty large data set, so I pasted the entire code to a pastebin Here is the code without the data part:

    # read tmp from the pastebin  

library(ggplot2)
plt <- ggplot(tmp, aes(region, score))
plt1 <- plt + geom_violin(aes(region, score), scale='width', trim=F)  + ylim(0, 1) + ggtitle('with ylim')
plt2 <- plt + geom_violin(aes(region, score), scale='width', trim=F)  + ggtitle('without ylim')

Setting y limits for this plot results in pretty ugly "violins":

enter image description here

enter image description here

What is this, why does this happen and how to avoid this ugly problem?

BTW, setting trim=T solves the problem.

Upvotes: 3

Views: 3008

Answers (1)

Peyton
Peyton

Reputation: 7396

From a bit of digging, I think the technical source of the problem is this: Your y variable is barely within [0, 1], so you are of course going to have density falling outside of that. With stat_density, this excess density is just cut off, but with geom_violin/stat_ydensity, the excess is left, and the scale is allowed to extend. With your ylim and trim=FALSE though, these y values outside [0, 1] are kept and just set as NA, which screws up the drawing in geom_polygon. You can actually see this with a smaller example with data in [0, 1]:

x <- runif(1e4, 0, 1)
ggplot(mapping=aes(1, x)) + geom_violin(trim=FALSE) + ylim(0, 1)

a broken violin plot

There are a couple of ways around this. The first is to just leave the default trim=TRUE:

ggplot(mapping=aes(1, x)) + geom_violin() + ylim(0, 1)

a violin plot with ylim and trim=TRUE

Note that ylim (scale_y_continuous) will actually remove raw data outside of [0, 1] in this case. In your example, you don't have any points outside of this, and neither do I here. But it's something to be aware of. There will also be some padding at the top and bottom, perhaps misleading the viewer to think that there is no density outside of [0, 1].

Perhaps a better solution is to use coord_cartesian, which will simply "zoom in" to the graph, leaving the data and resulting density untouched:

ggplot(mapping=aes(1, x)) + geom_violin(trim=FALSE) + coord_cartesian(ylim=c(0, 1))

a violin plot with coord_cartesian

Upvotes: 6

Related Questions