Karl Wolfschtagg
Karl Wolfschtagg

Reputation: 567

Where does ggplot set the order of the color scheme?

I have a data set that I'm showing in a series of violin plots with one categorical variable and one continuous numeric variable. When R generated the original series of violins, the categorical variable was plotted alphabetically (I rotated the plot, so it appears alphabetically from bottom to top). I thought it would look better if I sorted them using the numeric variable.

When I do this, the color scheme doesn't turn out as I wanted it to. It's like R assigned the colors to the violins before it sorted them; after the sorting, they kept their original colors - which is the opposite of what I wanted. I wanted R to sort them first and then apply the color scheme.

I'm using the viridis color scheme here, but I've run into the same thing when I used RColorBrewer.

Here is my code:

# Start plotting
g <- ggplot(NULL)

# Violin plot
g <- g + geom_violin(data = df, aes(x = reorder(catval, -numval, 
na.rm = TRUE), y = numval, fill = catval), trim = TRUE, 
scale = "width", adjust = 0.5)

(snip)

# Specify colors
g <- g + scale_colour_viridis_d()

# Remove legend
g <- g + theme(legend.position = "none") 

# Flip for readability
g <- g + coord_flip()

# Produce plot
g

Here is the resulting plot. violinplot

If I leave out the reorder() argument when I call geom_violin(), the color order is what I would like, but then my categorical variable is sorted alphabetically and not by the numeric variable.

Is there a way to get what I'm after?

Upvotes: 0

Views: 475

Answers (1)

Jon Spring
Jon Spring

Reputation: 66415

I think this is a reproducible example of what you're seeing. In the diamonds dataset, the mean price of "Good" diamonds is actually higher than the mean for "Very Good" diamonds.

library(dplyr)
diamonds %>%
  group_by(cut) %>%
  summarize(mean_price = mean(price))
# A tibble: 5 x 2
  cut       mean_price
  <ord>          <dbl>
1 Fair           4359.
2 Good           3929.
3 Very Good      3982.
4 Premium        4584.
5 Ideal          3458.

By default, reorder uses the mean of the sorting variable, so Good is plotted above Very Good. But the fill is still based on the un-reordered variable cut, which is a factor in order of quality.

ggplot(diamonds, aes(x = reorder(cut, -price),
                     y = price, fill = cut)) + 
  geom_violin() +
  coord_flip()

enter image description here

If you want the color to follow the ordering, then you could reorder upstream of ggplot2, or reorder in both aesthetics:

ggplot(diamonds, aes(x = reorder(cut, -price),
                     y = price, 
                     fill = reorder(cut, -price))) + 
  geom_violin() +
  coord_flip()

Or

diamonds %>%
  mutate(cut = reorder(cut, -price)) %>%
  ggplot(aes(x = cut, y = price, fill = cut)) + 
  geom_violin() +
  coord_flip()

enter image description here

Upvotes: 1

Related Questions