Reputation: 361
I have a data frame which has two types of 'groups,' the densities of which I would like to overlay on the same graph.
using ggplot, I tried to graph the density using the following two lines of code:
full$group <- factor(full$group)
ggplot(full, aes(x=income, fill=group)) + geom_density()
The issue with this is that the it does not take the frequency variable (freq) into account, and simply calculates the frequency itself. That is an issue because there is exactly one row for every income-group combination.
I believe I have two options, each of which has a question:
a) Should I plot the graph using the way the data is currently formatted? If so, how would I do that?
b) Should I reformat the data to make the frequency of each group/income combination equivalent to the freq variable assigned to it? If so, how would I do that?
This is the kind of graph I would like, where "income" = "rating" and "group" = "cond":
dput of 'full':
full <- structure(list(income = c(10000, 19000, 29000, 39000, 49000, 75000, 99000, 1e+05, 10000, 19000,29000, 39000, 49000, 75000, 99000, 1e+05),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("one", "two"), class = "factor"),
freq = c(1237, 1791, 743, 291, 256, 212, 29, 11, 921, 1512, 614, 301, 209, 223, 48, 1)), .Names = c("income", "group", "freq"),
row.names = c(NA, 16L), class = "data.frame")
Upvotes: 2
Views: 1163
Reputation: 206253
You can repeat the observations by their frequency with
ggplot(full[rep(1:nrow(full), full$freq),]) +
geom_density(aes(x=income, fill=group), color="black", alpha=.75, adjust=4)
Of course with your data this produces a pretty lousy plot
When estimating a density, your data should be observations from a continuous distribution. Here you really have a discrete distribution with repeated observations (in a true continuous distribution, the probability of seeing any value more than once is 0).
You could try to smooth this curve by setting the adjust=
parameter to a number >1, (like 3 or 4). But really, your input data is just not in an appropriate form for a density plot. A bar plot would be a better choice. Maybe something like
ggplot(full, aes(as.factor(income), freq, fill=group)) +
geom_bar(stat="identity", position="dodge")
Upvotes: 2