iNyar
iNyar

Reputation: 2326

Divergent stacked bar chart with ggplot2: issue with factor ordering in legend

I'm trying to plot Likert scale data on a divergent stacked bar chart with ggplot2.

I have seen many solutions, among which the best one I found is this faceted solution (no need for the facets though). I particularly appreciate the fact that, for odd-numbered scales, the neutral value is centered on 0.

I reproduced the idea (using two geom_col() with reversed counts) of this solution in a simplified way here:

# Data sample
data <- 
    tibble(
        question = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
        option = c("Very bad", "Bad", "Neutral", "Good", "Exc",
                             "Very bad", "Bad", "Neutral", "Good", "Exc"),
        count = c(1, 10, 4, 5, 3, 3, 4, 5, 6, 8)
        ) %>% 
    mutate(
        option = option %>% factor(levels = c("Very bad", "Bad", "Neutral", "Good", "Exc")),
        count  = if_else(option == "Neutral", count/2, count)
        )

# Divergent stacked bar chart
data %>% 
    ggplot(aes(question, count, fill = option)) +
    geom_col(data = filter(data, option %in% c("Neutral", "Good", "Exc")),
                     position = position_stack(reverse = T)) +
    geom_col(data = filter(data, option %in% c("Neutral", "Bad", "Very bad")),
                     aes(y = -count)) +
    scale_fill_brewer(palette = "RdBu") +
    coord_flip()

Which gives the following result:

Ggplot divergent stacked bar chart

As you can see, the order of the plot is correct, but the legend and coloring seem to have forgotten the factor ordering (adding ordered = T to the factor doesn't help).

If I remove the second geom_col(), then all is fine, but that obviously ain't my goal.

How can I force ggplot2 to maintain factor ordering in the legend?

Upvotes: 2

Views: 831

Answers (1)

stefan
stefan

Reputation: 123938

The issue is that by default unused factor levels get dropped. To solve your issue set drop=FALSE in scale_fill_brewer:

Not sure about the exact internals, but it's related to the fact that you make use of two geom_col with different datasets.

library(ggplot2)

# Divergent stacked bar chart
ggplot(data, aes(question, count, fill = option)) +
  geom_col(data = filter(data, option %in% c("Neutral", "Good", "Exc")),
           position = position_stack(reverse = T)) +
  geom_col(data = filter(data, option %in% c("Neutral", "Bad", "Very bad")),
           aes(y = -count)) +
  scale_fill_brewer(palette = "RdBu", drop = FALSE) +
  coord_flip()

Upvotes: 2

Related Questions