zebrainatree
zebrainatree

Reputation: 361

How to ggplot two groups of income-segment populations and values

I have a data frame which has two types of 'groups,' the densities of which I would like to overlay on the same graph.

using ggplot, I tried to graph the density using the following two lines of code:

full$group <- factor(full$group)

ggplot(full, aes(x=income, fill=group)) + geom_density()

The issue with this is that the it does not take the frequency variable (freq) into account, and simply calculates the frequency itself. That is an issue because there is exactly one row for every income-group combination.

I believe I have two options, each of which has a question:

a) Should I plot the graph using the way the data is currently formatted? If so, how would I do that?

b) Should I reformat the data to make the frequency of each group/income combination equivalent to the freq variable assigned to it? If so, how would I do that?

This is the kind of graph I would like, where "income" = "rating" and "group" = "cond":

enter image description here

dput of 'full':

full <- structure(list(income = c(10000, 19000, 29000, 39000, 49000, 75000, 99000, 1e+05, 10000, 19000,29000, 39000, 49000, 75000, 99000, 1e+05),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("one", "two"), class = "factor"),
freq = c(1237, 1791, 743, 291, 256, 212, 29, 11, 921, 1512, 614, 301, 209, 223, 48, 1)), .Names = c("income", "group", "freq"),
row.names = c(NA, 16L), class = "data.frame")

Upvotes: 2

Views: 1163

Answers (1)

MrFlick
MrFlick

Reputation: 206253

You can repeat the observations by their frequency with

ggplot(full[rep(1:nrow(full), full$freq),]) + 
geom_density(aes(x=income, fill=group), color="black", alpha=.75, adjust=4)

Of course with your data this produces a pretty lousy plot

enter image description here

When estimating a density, your data should be observations from a continuous distribution. Here you really have a discrete distribution with repeated observations (in a true continuous distribution, the probability of seeing any value more than once is 0).

You could try to smooth this curve by setting the adjust= parameter to a number >1, (like 3 or 4). But really, your input data is just not in an appropriate form for a density plot. A bar plot would be a better choice. Maybe something like

ggplot(full, aes(as.factor(income), freq, fill=group)) + 
    geom_bar(stat="identity", position="dodge")

enter image description here

Upvotes: 2

Related Questions