Reputation: 6538
How can I group a density plot and have the density of each group sum to one, when using weighted data?
The ggplot2
help for geom_density()
suggests a hack for using weighted data: dividing by the sum of the weights. But when grouped, this means that the combined density of the groups totals one. I would like the density of each group to total one.
I have found two clumsy ways to do this. The first is to treat each group as a separate dataset:
library(ggplot2)
library(ggplot2movies) # load the movies dataset
m <- ggplot()
m + geom_density(data = movies[movies$Action == 0, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="black") +
geom_density(data = movies[movies$Action == 1, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="blue")
Obvious disadvantages are the manual handling of factor levels and aesthetics. I also tried using the windowing functionality of the data.table
package to create a new column for the total votes per Action group, dividing by that instead:
movies.dt <- data.table(movies)
setkey(movies.dt, Action)
movies.dt[, votes.per.group := sum(votes), Action]
m <- ggplot(movies.dt, aes(x=rating, weight=votes/votes.per.group, group = Action, colour = Action))
m + geom_density(fill=NA)
Are there neater ways to do this? Because of the size of my tables, I'd rather not replicate rows by their weighting for the sake of using frequency.
Upvotes: 3
Views: 5654
Reputation: 26
Using dplyr
library(dplyr)
library(ggplot2)
library(ggplot2movies)
movies %>%
group_by(Action) %>%
mutate(votes.grp = sum(votes)) %>%
ggplot(aes(x=rating, weight=votes/votes.grp, group = Action, colour = Action)) +
geom_density()
Upvotes: 1
Reputation: 59425
I think an auxillary table might be your only option. I had a similar problem here. The issue it seems is that, when ggplot
uses aggregating functions in aes(...)
, it applies them to the whole dataset, not the subsetted data. So when you write
aes(weight=votes/sum(votes))
the votes
in the numerator is subsetted based on Action
, but votes in the denominator, sum(votes)
, is not. The same is true for the implicit grouping with facets.
If someone else has a way around this I'd love to hear it.
Upvotes: 1