Reputation: 1641
I'm approximating a distribution with gaussian mixtures and was wondering whether there was an easy way to automatically plot the estimated kernel density of the whole (uni-dimensional) dataset as the sum of the component densities in a nice fashion like this using ggplot2:
Given the following example data, my approach in ggplot2 would be to manually plot the subset densities into the scaled overall density like this:
#example data
a<-rnorm(1000,0,1) #component 1
b<-rnorm(1000,5,2) #component 2
d<-c(a,b) #overall data
df<-data.frame(d,id=rep(c(1,2),each=1000)) #add group id
##ggplot2
require(ggplot2)
ggplot(df) +
geom_density(aes(x=d,y=..scaled..)) +
geom_density(data=subset(df,id==1), aes(x=d), lty=2) +
geom_density(data=subset(df,id==2), aes(x=d), lty=4)
Note that this does not work out regarding the scales. It also does not work when you scale all 3 densities or no density at all. So I was not able to replicate above plot.
In addition, I am not able to automatically generate this plot without having to subset manually. I tried using position = "stacked" as parameter in geom_density.
I usually have around 5-6 Components per dataset, so manually subsetting would be possible. However, I would like to have different colors or line-types per component density which are displayed in the legend of ggplot, so doing all subsets manually would increase the workload quite a bit.
Any ideas? Thanks!
Upvotes: 2
Views: 3704
Reputation: 19716
Here is a possible solution by specifying each density in the aes
call with position = "identity"
in one layer and in the second layer using stacked density without the legend.
ggplot(df) +
stat_density(aes(x = d, linetype = as.factor(id)), position = "stack", geom = "line", show.legend = F, color = "red") +
stat_density(aes(x = d, linetype = as.factor(id)), position = "identity", geom = "line")
Do note that when using more then two groups:
a <- rnorm(1000, 0, 1)
b <- rnorm(1000, 5, 2)
c <- rnorm(1000, 3, 2)
d <- rnorm(1000, -2, 1)
d <- c(a, b, c, d)
df <- data.frame(d, id = as.factor(rep(c(1, 2, 3, 4), each = 1000)))
curves for each stack appear (this is a problem with the two group example but linetype
in first layer disguised it - use group
instead to check) :
gplot(df) +
stat_density(aes(x = d, group = id), position = "stack", geom = "line", show.legend = F, color = "red") +
stat_density(aes(x = d, linetype = id), position = "identity", geom = "line")
A relatively easy fix to this is to add alpha mapping and manually set it to 0 for unwanted curves:
ggplot(df) +
stat_density(aes(x=d, alpha = id), position = "stack", geom = "line", show.legend = F, color = "red") +
stat_density(aes(x=d, linetype = id), position = "identity", geom = "line")+
scale_alpha_manual(values = c(1,0,0,0))
Upvotes: 2