PrashanthVajjhala
PrashanthVajjhala

Reputation: 91

Stacked bar chart with multiple categorical variables in ggplot2 with facet_grid

I am trying to create a stacked bar chart in ggplot2 to display the percentage of values corresponding to each categorical variable. Here's an example of the data that I am trying to work with.

sampledf <- data.frame("Death" = rep(0:1, each = 5), 
                   "HabitA" = rep(0:1, c(3, 7)),
                   "HabitB" = rep(1:2, c(4, 6)),
                   "HabitC" = rep(0:1, c(6, 4)))

Each of the habits are the columns that I am using to create the stacked bar chart, and I want to use the Death column in facet_grid. I'm looking to show the percentage of values for each habit in the bar chart.

The output data I think I need to create the chart should will translate to, under Death = 0, HabitA has 60% 0 values, and 40% of the values are 1, while under Death = 1, 100% of HabitA values are 1.

I have produced charts like this using ggplot and group_by, summarise for only one attribute, but I am not sure how this works with multiple categorical attributes in the data.

sampledf %>% 
  group_by(Death, HabitA) %>% 
  summarise(count=n()) %>% 
  mutate(perc=count/sum(count))

This produces what I want for just one variable, but when I include another attribute in the group by argument, it returns counts a percentages for a combination of all 3 attributes which is not what I am looking for. I tried using the summarise_at/mutate_at but it doesn't seem to be working.

sampledf %>% 
  group_by(Death) %>% 
  mutate_at(c("HabitA", "HabitB"), Counts = n())

Is there a straightforward way to do this in R, and use the resulting data as input for ggplot2?

Edit:

I tried to reshape the data and using the long form to build my plot. Here's what I have.

long <- melt(sampledf, id.vars = c("Death"))

The resulting data is in this format.

  Death variable value
1     0   HabitA     0
2     0   HabitA     0
3     0   HabitA     0
4     0   HabitA     1
5     0   HabitA     1
6     1   HabitA     1
7     1   HabitA     1

I'm not sure how to use the value attribute to build the plot, because the ggplot I am currently trying to build is counting the total number of times each level occurs in the variable column.

ggplot(long, aes(x = variable, fill = variable)) +
  geom_bar(stat = "count", position = "dodge") + facet_grid(~ Death)

Upvotes: 1

Views: 3556

Answers (1)

user7886302
user7886302

Reputation:

Try this, maybe not so straightforward, but it works. It includes reshaping as @aosmith suggested by gather. Then calculation of number of observations after grouping and then percentage for each group Death + habitat. Then summarized to get unique values.

sampledf_edited <- sampledf %>% 
  tidyr::gather("habitat", "count", 2:4) %>% 
  group_by(Death, habitat, count) %>% 
  mutate(observation = n()) %>% 
  ungroup() %>% 
  group_by(Death, habitat) %>% 
  mutate(percent = observation/n()) %>% 
  ungroup() %>% 
  group_by(Death, habitat, count, percent) %>%
  summarize()

It is necessarry to make count factor.

sampledf_edited$count <- as.factor(sampledf_edited$count)

Plotting by ggplot.

ggplot(sampledf_edited, aes(habitat, percent, fill = count)) +  
geom_bar(stat = "identity") + 
facet_grid(~ Death)

If your question has been answered, please make sure to accept an answer for further references.

---EDIT--- plot added

ggplot

Upvotes: 2

Related Questions