Reputation: 1
I have a problem with my density histogram in ggplot2. I am working in RStudio, and I am trying to create density histogram of income, dependent on persons occupation. My problem is, that when I use my code:
data = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
sep=",",header=F,col.names=c("age", "type_employer", "fnlwgt", "education",
"education_num","marital", "occupation", "relationship", "race","sex",
"capital_gain", "capital_loss", "hr_per_week","country", "income"),
fill=FALSE,strip.white=T)
ggplot(data=dat, aes(x=income)) +
geom_histogram(stat='count',
aes(x= income, y=stat(count)/sum(stat(count)),
col=occupation, fill=occupation),
position='dodge')
I get in response histogram of each value divided by overall count of all values of all categories, and I would like for example for people earning >50K whom occupation is 'craft repair' divided by overall number of people whos occupation is craft-repair, and the same for <=50K and of the same occupation category, and like that for every other type of occupation
And the second question is, after doing propper density histogram, how can I sort the bars in decreasing order?
Upvotes: 0
Views: 635
Reputation: 7312
This is a situation where it makes sence to re-aggregate your data first, before plotting. Aggregating within the ggplot
call works fine for simple aggregations, but when you need to aggregate, then peel off a group for your second calculation, it doesn't work so well. Also, note that because your x axis is discrete, we don't use a histogram here, instead we'll use geom_bar()
First we aggregate by count, then calculate percent of total using occupation
as the group.
d2 <- data %>% group_by(income, occupation) %>%
summarize(count= n()) %>%
group_by(occupation) %>%
mutate(percent = count/sum(count))
Then simply plot a bar chart using geom_bar
and position = 'dodge'
so the bars are side by side, rather than stacked.
d2 %>% ggplot(aes(income, percent, fill = occupation)) +
geom_bar(stat = 'identity', position='dodge')
Upvotes: 2