Reputation: 235
I have three ways of producing a plot, each of which is only one step away from what I would like. I am using the training data set from Kaggle's Titanic competition, and would like to have a plot faceted on Pclass (socio-economic class), where each bar is the percentage that lived/died (variable = Survived (binary)) within that facet. I would also like the binary variable colored. Here are my three plots:
g <- ggplot(training, aes(Survived, y = ..prop.., group = Survived))
g <- g + geom_bar(aes(fill = Survived), position = "dodge", stat = "count")
g <- g + facet_grid(~Pclass)
g <- g + scale_y_continuous(labels = scales::percent)
g <- g + labs(x = "1 = Upper Class | 2 = Middle Class | 3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
g
q <- qplot(x = Survived, y = ..prop.., data = training, geom = "bar",
fill = Survived, facets = ~Pclass, stat = "count") +
scale_y_continuous(labels = scales::percent) +
labs(x = "1 = Upper Class | 2 = Middle Class | 3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
q
f <- ggplot(training, aes(Survived, group = Survived))
f <- f + geom_histogram(aes(fill = Survived), position = "fill", stat = "count")
f <- f + facet_grid(~Pclass)
f <- f + scale_y_continuous(labels = scales::percent)
f <- f + labs(x = "1 = Upper Class | 2 = Middle Class | 3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
f
They all look exactly the same, the only problem is that both the Survived/Died bars within each plot equal 100%. Any ideas how to get the percentages correct within each facet?
Upvotes: 0
Views: 276
Reputation: 576
I think this is what you're going for. To get group percentages with facets, use geom_bar
, ..prop..
, and specify the facet variable as the group
:
f <- ggplot(training,
aes(y=Survived,
x=factor(Survived, labels=c("Died","Lived"))))
f <- f + geom_bar(aes(y=..prop.., group=Pclass,
fill=factor(..x.., labels=c("Died","Lived"))))
f <- f + facet_grid(~factor(Pclass,
labels=c("Upper Class", "Middle Class", "Lower Class")))
f <- f + scale_y_continuous(labels = scales::percent)
f <- f + scale_fill_discrete(name="Survival Status")
f <- f + labs(x="", y = "Percentage", title = "The Probability of Living Given Socio-Economic Status")
f
But there's something buggy going on with the fill
parameter though. The above works, but I don't know why it won't accept Survived
, and why you'd have to re-factor x like I've done.
As a side note, when you have two bars where the percentage adds up to 100%, it may not be the best idea to show them side by side. You might want to stack them to show proportions more clearly.
Upvotes: 1
Reputation: 3806
I'm not sure about your "y = ..prop.." argument. The code below calculates survival and death rates ahead of time and they plot just fine.
library(tidyverse)
training %>%
group_by(Pclass) %>%
summarise(
survival_rate = mean(Survived),
death_rate = 1 - survival_rate
) %>%
gather(survival_rate, death_rate, key = rate_type, value = rate) %>%
ggplot(., aes(x = rate_type, y = rate, fill = rate_type)) +
geom_col(position = "dodge") +
facet_grid(~Pclass, labeller = as_labeller(c(
"1" = "First Class", "2" = "Second Class", "3" = "Third Class"))
) +
scale_y_continuous(labels = scales::percent) +
labs(x = NULL,
y = "Survival Rate",
title = "The Probability of Living Given Socio-Economic Status")
Upvotes: 0