Reputation: 91
I am trying to create a stacked bar chart in ggplot2 to display the percentage of values corresponding to each categorical variable. Here's an example of the data that I am trying to work with.
sampledf <- data.frame("Death" = rep(0:1, each = 5),
"HabitA" = rep(0:1, c(3, 7)),
"HabitB" = rep(1:2, c(4, 6)),
"HabitC" = rep(0:1, c(6, 4)))
Each of the habits are the columns that I am using to create the stacked bar chart, and I want to use the Death column in facet_grid. I'm looking to show the percentage of values for each habit in the bar chart.
The output data I think I need to create the chart should will translate to, under Death = 0, HabitA has 60% 0 values, and 40% of the values are 1, while under Death = 1, 100% of HabitA values are 1.
I have produced charts like this using ggplot and group_by, summarise for only one attribute, but I am not sure how this works with multiple categorical attributes in the data.
sampledf %>%
group_by(Death, HabitA) %>%
summarise(count=n()) %>%
mutate(perc=count/sum(count))
This produces what I want for just one variable, but when I include another attribute in the group by argument, it returns counts a percentages for a combination of all 3 attributes which is not what I am looking for. I tried using the summarise_at/mutate_at but it doesn't seem to be working.
sampledf %>%
group_by(Death) %>%
mutate_at(c("HabitA", "HabitB"), Counts = n())
Is there a straightforward way to do this in R, and use the resulting data as input for ggplot2?
Edit:
I tried to reshape the data and using the long form to build my plot. Here's what I have.
long <- melt(sampledf, id.vars = c("Death"))
The resulting data is in this format.
Death variable value
1 0 HabitA 0
2 0 HabitA 0
3 0 HabitA 0
4 0 HabitA 1
5 0 HabitA 1
6 1 HabitA 1
7 1 HabitA 1
I'm not sure how to use the value
attribute to build the plot, because the ggplot I am currently trying to build is counting the total number of times each level occurs in the variable
column.
ggplot(long, aes(x = variable, fill = variable)) +
geom_bar(stat = "count", position = "dodge") + facet_grid(~ Death)
Upvotes: 1
Views: 3556
Reputation:
Try this, maybe not so straightforward, but it works. It includes reshaping as @aosmith suggested by gather
. Then calculation of number of observations after grouping and then percentage for each group Death
+ habitat
. Then summarized to get unique values.
sampledf_edited <- sampledf %>%
tidyr::gather("habitat", "count", 2:4) %>%
group_by(Death, habitat, count) %>%
mutate(observation = n()) %>%
ungroup() %>%
group_by(Death, habitat) %>%
mutate(percent = observation/n()) %>%
ungroup() %>%
group_by(Death, habitat, count, percent) %>%
summarize()
It is necessarry to make count
factor.
sampledf_edited$count <- as.factor(sampledf_edited$count)
Plotting by ggplot
.
ggplot(sampledf_edited, aes(habitat, percent, fill = count)) +
geom_bar(stat = "identity") +
facet_grid(~ Death)
If your question has been answered, please make sure to accept an answer for further references.
---EDIT--- plot added
Upvotes: 2