Reputation:
I am confused about the meaning of the following variants of geom_density
in ggplot:
Can someone please explain the difference between these four calls:
geom_density(aes_string(x=myvar))
geom_density(aes_string(x=myvar, y=..density..))
geom_density(aes_string(x=myvar, y=..scaled..))
geom_density(aes_string(x=myvar, y=..count../sum(..count..)))
My understanding is that:
geom_density
alone will produce a density whose area under the curve sums to 1geom_density
with ..density..
basically does the same... ?..count../sum(..count..)
will normalize the peak heights to be more like a normalized histogram, ensuring that all the heights sum to 1..count..
by itself without the denominator will just multiply each bin by # of items in it..scaled..
parameter will make it so the maximum value of the density is 1.I find ..scaled..
very counterintuitive and have never seen it used if my interpretation of it is correct so I'd like to ignore that. I am mainly looking for a clarification of the differences between geom_density
and a kind of normalized density plot, which I am assuming requires the ...count../...
argument. thanks.
(Related: Error with ggplot2 mapping variable to y and using stat="bin")
Upvotes: 23
Views: 11075
Reputation: 115382
The default aesthetic for stat_density
is ..density..
, so a call to geom_density
which uses stat_density
by default, will plot y = ..density..
by default.
You can see how the various columns are caculated by looking at the source code
..scaled..
is defined as
densdf$scaled <- densdf$y / max(densdf$y, na.rm = TRUE)
Feel free to ignore it if you wish.
Looking at the source code for stat_bin
The results are computed as such
res <- within(results, {
count[is.na(count)] <- 0
density <- count / width / sum(abs(count), na.rm=TRUE)
ncount <- count / max(abs(count), na.rm=TRUE)
ndensity <- density / max(abs(density), na.rm=TRUE)
})
So if you want to compare the results of geom_histogram
(using the default stat = 'bin'
), then you can set y = ..density..
and it will calculate count / sum(count)
for you (accounting for the width of the bins)
If you wanted to compare geom_density(aes(y=..scaled..))
with stat_bin
, then you would use geom_histogram(aes(y = ..ndensity..))
You could get them on the same scale by using ..count..
in both as well, however you would need to adjust the adjust
parameter in stat_density
to get the appropriately detailed approximation of the curve.
Upvotes: 14