How does ggplot2 density differ from the density function?

Question

Why do the following plots look different? Both methods appear to use Gaussian kernels.

How does ggplot2 compute a density?

library(fueleconomy)

d <- density(vehicles$cty, n=2000)
ggplot(NULL, aes(x=d$x, y=d$y)) + geom_line() + scale_x_log10()

ggplot(vehicles, aes(x=cty)) + geom_density() + scale_x_log10()

UPDATE:

A solution to this question already appears on SO here, however the specific parameters ggplot2 is passing to the R stats density function remain unclear.

An alternate solution is to extract the density data straight from the ggplot2 plot, as shown here

user20650 · Accepted Answer

In this case, it is not the density calculation that is different but how the log10 transform is applied.

First check the densities are similar without transform

library(ggplot2)
library(fueleconomy)

d <- density(vehicles$cty, from=min(vehicles$cty), to=max(vehicles$cty))
ggplot(data.frame(x=d$x, y=d$y), aes(x=x, y=y)) + geom_line() 
ggplot(vehicles, aes(x=cty)) + stat_density(geom="line")

So the issue seems to be the transform. In the stat_density below, it seems as if the log10 transform is applied to the x variable before the density calculation. So to reproduce the results manually you have to transform the variable prior to the calculating the density. Eg

d2 <- density(log10(vehicles$cty), from=min(log10(vehicles$cty)), 
                                               to=max(log10(vehicles$cty)))
ggplot(data.frame(x=d2$x, y=d2$y), aes(x=x, y=y)) + geom_line() 
ggplot(vehicles, aes(x=cty)) + stat_density(geom="line") + scale_x_log10()

PS: To see how ggplot prepares the data for the density, you can look at the code as.list(StatDensity) leads to StatDensity$compute_group to ggplot2:::compute_density

How does ggplot2 density differ from the density function?

Answers (1)

Related Questions