Reputation: 233
I was working with a dataset that consists of two different groups of observations where the value is an integer. I wanted to plot the density of these to get a sense for how the different groups are distributed over the values.
What happened was one group had a 'smooth' density while the other had a 'wavy' density. I know this has something to do with bandwidth and the fact that my data is basically tied to discrete observations but I would love if someone can explain exactly why.
Here's an example:
data2 <- rbind(
data.frame(group=rep('poisson1', 1000), value = rpois(1000, 5)),
data.frame(group=rep('poisson2', 1000), value = rpois(1000, 45)))
library(ggplot2)
ggplot(data2, aes(x=value, fill=group)) +
geom_density()
And strangely, I can create that dataframe again to get a new sample, and the plot sometimes is smooth:
Upvotes: 5
Views: 1902
Reputation: 28339
Observed smoothness (or lack of smoothness) is "caused" by rpois()
function. lambda
argument in rpois()
function has to be non-negative mean of wanted random distribution. Therefore, when you pass lambda
that is closer to zero (rpois(1000, 5)
) it will generate less unique values (as it's bounded by zero).
Consider this example:
nValue <- 1e3
nLambda <- c(1:9, seq(10, 100, 10))
foo <- lapply(nLambda, function(lambda) {
data.frame(value = rpois(nValue, lambda), lambda)
})
data <- do.call(rbind, foo)
ggplot(data, aes(value, group = lambda, color = lambda)) +
geom_density()
We can see that lambda
closer to zero will have peaks, while moving away from zero will generate more smooth lines.
You can also test this by looking into variance in each lambda
group:
ggplot(aggregate(data$value, list(data$lambda), var), aes(Group.1, x)) +
geom_line() +
geom_point() +
labs(x = "Lambda",
y = "Variance")
Upvotes: 3