user248237
user248237

Reputation:

How to interpret the different ggplot2 densities?

I am confused about the meaning of the following variants of geom_density in ggplot:

Can someone please explain the difference between these four calls:

  1. geom_density(aes_string(x=myvar))
  2. geom_density(aes_string(x=myvar, y=..density..))
  3. geom_density(aes_string(x=myvar, y=..scaled..))
  4. geom_density(aes_string(x=myvar, y=..count../sum(..count..)))

My understanding is that:

I find ..scaled.. very counterintuitive and have never seen it used if my interpretation of it is correct so I'd like to ignore that. I am mainly looking for a clarification of the differences between geom_density and a kind of normalized density plot, which I am assuming requires the ...count../... argument. thanks.

(Related: Error with ggplot2 mapping variable to y and using stat="bin")

Upvotes: 23

Views: 11075

Answers (1)

mnel
mnel

Reputation: 115382

The default aesthetic for stat_density is ..density.., so a call to geom_density which uses stat_density by default, will plot y = ..density.. by default.

You can see how the various columns are caculated by looking at the source code

..scaled.. is defined as

densdf$scaled <- densdf$y / max(densdf$y, na.rm = TRUE)

Feel free to ignore it if you wish.

Looking at the source code for stat_bin

The results are computed as such

res <- within(results, {
    count[is.na(count)] <- 0
    density <- count / width / sum(abs(count), na.rm=TRUE)
    ncount <- count / max(abs(count), na.rm=TRUE)
    ndensity <- density / max(abs(density), na.rm=TRUE)
  })

So if you want to compare the results of geom_histogram (using the default stat = 'bin'), then you can set y = ..density.. and it will calculate count / sum(count) for you (accounting for the width of the bins)

If you wanted to compare geom_density(aes(y=..scaled..)) with stat_bin, then you would use geom_histogram(aes(y = ..ndensity..))

You could get them on the same scale by using ..count.. in both as well, however you would need to adjust the adjust parameter in stat_density to get the appropriately detailed approximation of the curve.

Upvotes: 14

Related Questions