Panda
Panda

Reputation: 155

R plot density ggplot vs plot

I am using the density function in R and then computing some results from the obtained densities. After that, I use the ggplot2 to display the PDFs of the same data.

However, the results are slightly different from what is shown in the respective plot - something that is confirmed by plotting the density output directly (using plot {graphics}).

Any idea why? How can I correct it, so the results and plot (from ggplot2) do match / are from exact same data?

An example of this (code and images):

srcdata = data.frame("Value" = c(4.6228, 1.7942, 4.2738, 2.1502, 2.2665, 5.1717, 4.1015, 2.5126, 4.4270, 4.4729, 2.5112, 2.3493, 2.2787, 2.0114, 4.6931, 4.6582, 3.3162, 2.2995, 4.3954, 1.8488), "Type" = c("Positive", "Negative", "Positive", "Negative", "Negative", "Positive", "Positive", "Negative", "Positive", "Positive", "Negative", "Negative", "Negative", "Negative", "Positive", "Positive", "Positive", "Negative", "Positive", "Negative"))

bwidth <- ( density ( srcdata$Value ))$bw

sample <- split ( srcdata$Value, srcdata$Type )[ 1:2 ]

xmin = min(srcdata$Value) - 0.2 * abs(min(srcdata$Value))
xmax = max(srcdata$Value) + 0.2 * abs(max(srcdata$Value))

densities <- lapply ( sample, density, bw = bwidth, n = 512, from = xmin, to = xmax )

#plotting densities result
plot( densities [[ 1 ]], xlim = c(xmin,xmax), col = "steelblue", main = "" )
lines ( densities [[ 2 ]], col = "orange" )

#plot using ggplot2
ggplot(data = srcdata, aes(x=Value)) + geom_density(aes(group=Type, colour=Type)) + xlim(xmin, xmax)

#or with ggplot2 (using easyGgplot2)
ggplot2.density(data=srcdata, xName='Value', groupName='Type', alpha=0.5, xlim=c(xmin,xmax))

image:

enter image description here

Upvotes: 3

Views: 1106

Answers (1)

tsurudak
tsurudak

Reputation: 602

The current comments correctly identify that you are using two different bandwidths to calculate densities in your two plots: the plot() graph is using the bwidth you specified as the bandwidth and the ggplot() graph uses the default bandwidth. Ideally you would pass bwidth to the ggplot graph and that would solve everything, however the commentary around an SO question here suggests that you can't pass a bandwidth parameter to stat_density or geom_density.

The easiest thing to do to get the same output in both graphs is to let density() determine the optimal bandwidth in both your manual density calculation (below) and in the ggplot graph (using the same code you already have)

densities <- lapply ( sample, density, n = 512, from = xmin, to = xmax )

Alternatively, the actual binwidth used in geom/stat_density is the pre-determined binwidth times the adjust parameter (density documentation) so you could specify an adjust value in stat_density (stat_density documentation) in an attempt to try to adjust the ggplot binwidth to match your bwidth variable. I found that an adjust value of 4.5 gives a similar (but not exact) version the original graph produced with your calculated densities:

ggplot(data = srcdata, aes(x=Value)) + 
    geom_density(aes(group=Type, colour=Type), adjust = 4.5) +
    xlim(xmin, xmax)

Adjusted ggplot density graph

EDIT You may find the answer to this question helpful if you want to specifically adjust your ggplot graph so that it uses your bwidth variable as the binwidth in the density smoothing: Understanding bandwidth smoothing in ggplot2

Upvotes: 3

Related Questions