Reputation: 155
I am using the density function in R and then computing some results from the obtained densities. After that, I use the ggplot2 to display the PDFs of the same data.
However, the results are slightly different from what is shown in the respective plot - something that is confirmed by plotting the density output directly (using plot {graphics}).
Any idea why? How can I correct it, so the results and plot (from ggplot2) do match / are from exact same data?
An example of this (code and images):
srcdata = data.frame("Value" = c(4.6228, 1.7942, 4.2738, 2.1502, 2.2665, 5.1717, 4.1015, 2.5126, 4.4270, 4.4729, 2.5112, 2.3493, 2.2787, 2.0114, 4.6931, 4.6582, 3.3162, 2.2995, 4.3954, 1.8488), "Type" = c("Positive", "Negative", "Positive", "Negative", "Negative", "Positive", "Positive", "Negative", "Positive", "Positive", "Negative", "Negative", "Negative", "Negative", "Positive", "Positive", "Positive", "Negative", "Positive", "Negative"))
bwidth <- ( density ( srcdata$Value ))$bw
sample <- split ( srcdata$Value, srcdata$Type )[ 1:2 ]
xmin = min(srcdata$Value) - 0.2 * abs(min(srcdata$Value))
xmax = max(srcdata$Value) + 0.2 * abs(max(srcdata$Value))
densities <- lapply ( sample, density, bw = bwidth, n = 512, from = xmin, to = xmax )
#plotting densities result
plot( densities [[ 1 ]], xlim = c(xmin,xmax), col = "steelblue", main = "" )
lines ( densities [[ 2 ]], col = "orange" )
#plot using ggplot2
ggplot(data = srcdata, aes(x=Value)) + geom_density(aes(group=Type, colour=Type)) + xlim(xmin, xmax)
#or with ggplot2 (using easyGgplot2)
ggplot2.density(data=srcdata, xName='Value', groupName='Type', alpha=0.5, xlim=c(xmin,xmax))
image:
Upvotes: 3
Views: 1106
Reputation: 602
The current comments correctly identify that you are using two different bandwidths to calculate densities in your two plots: the plot()
graph is using the bwidth
you specified as the bandwidth and the ggplot()
graph uses the default bandwidth. Ideally you would pass bwidth
to the ggplot graph and that would solve everything, however the commentary around an SO question here suggests that you can't pass a bandwidth parameter to stat_density
or geom_density
.
The easiest thing to do to get the same output in both graphs is to let density()
determine the optimal bandwidth in both your manual density calculation (below) and in the ggplot graph (using the same code you already have)
densities <- lapply ( sample, density, n = 512, from = xmin, to = xmax )
Alternatively, the actual binwidth used in geom/stat_density is the pre-determined binwidth times the adjust parameter (density documentation) so you could specify an adjust
value in stat_density
(stat_density documentation) in an attempt to try to adjust the ggplot binwidth to match your bwidth
variable. I found that an adjust value of 4.5 gives a similar (but not exact) version the original graph produced with your calculated densities:
ggplot(data = srcdata, aes(x=Value)) +
geom_density(aes(group=Type, colour=Type), adjust = 4.5) +
xlim(xmin, xmax)
EDIT
You may find the answer to this question helpful if you want to specifically adjust your ggplot graph so that it uses your bwidth
variable as the binwidth in the density smoothing: Understanding bandwidth smoothing in ggplot2
Upvotes: 3