Reputation: 5819
I want to plot faceted bar graphs and order them left-to-right from the largest to smallest values. I should be able to do this with code similar to this:
library(ggplot2)
ggplot(mpg, aes(reorder(cyl, -hwy), hwy)) +
geom_col() +
facet_wrap(~ manufacturer, scales = "free")
Instead what I get is ordering by the x-axis which happens to be 'cyl', smallest to largest values. How do I order descending, by the y-axis, so it looks like a Pareto chart? It has to be faceted as well. Thank you.
Upvotes: 5
Views: 2005
Reputation: 93821
If I understand your question, the goal is to plot the average highway mpg (the hwy
column) by cyl
for each manufacturer
. Within each manufacturer
, you want to order the x-axis (the cyl
values), by the mean hwy
value for each cyl
.
To do that, we need to create the plots separately for each manufacturer and then lay them out together. This is because we can't have different x-axis orderings (cyl
orderings in this case) for different panels in the same plot. (UPDATE: I stand corrected. @missuse's answer links to functions written by David Robinson, based on a blog post by Tyler Rinker to vary the x-axis label order in facetted plots.) So, we'll create a list of plots and then lay them out together, as if they were facetted.
library(tidyverse)
library(egg)
Since in the real data, the mean value of hwy
is always monotonically decreasing with increasing cyl
, we'll create an artificially high hwy
value for 8-cylinder Audis, just for illustration:
mpg$hwy[mpg$manufacturer=="audi" & mpg$cyl==8] = 40
Now we split the data by manufacturer
so we can create a separate plot, and therefore a separate cyl
ordering for each manufacturer. We'll use the map
function to iterate over the manufacturers.
plot.list = split(mpg, mpg$manufacturer) %>%
map(function(dat) {
# Order cyl by mean(hwy)
dat = dat %>% group_by(manufacturer, cyl) %>%
summarise(hwy = mean(hwy)) %>%
arrange(desc(hwy)) %>%
mutate(cyl = factor(cyl, levels=cyl))
ggplot(dat, aes(cyl, hwy)) +
geom_col() +
facet_wrap(~ manufacturer) +
theme(axis.title=element_blank()) +
expand_limits(y=mpg %>%
group_by(manufacturer,cyl) %>%
mutate(hwy=mean(hwy)) %>%
pull(hwy) %>% max)
})
Now let's remove the y-axis values and ticks from the plot that won't be in the first column when we lay out the plots together:
num_cols = 5
plot.list[-seq(1,length(plot.list), num_cols)] =
lapply(plot.list[-seq(1,length(plot.list), num_cols)], function(p) {
p + theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())
})
Finally, we lay out the plots. ggarrange
from the egg
package ensures that the panels all have the same width (otherwise the panels in the first column would be narrower than the others, due to space taken up by the y-axis labels).
ggarrange(plots=plot.list, left="Highway MPG", bottom="Cylinders", ncol=num_cols)
Note that the cyl
values for audi
are not in increasing order, showing that our reordering worked properly.
Upvotes: 4
Reputation: 19716
Here is a different approach that can be performed directly in ggplot utilizing two functions from here. I will use eipi10's example:
library(tidyverse)
mpg$hwy[mpg$manufacturer=="audi" & mpg$cyl==8] <- 40
dat <- mpg %>% group_by(manufacturer, cyl) %>%
summarise(hwy = mean(hwy)) %>%
arrange(desc(hwy)) %>%
mutate(cyl = factor(cyl, levels = cyl))
Functions:
reorder_within <- function(x, by, within, fun = mean, sep = "___", ...) {
new_x <- paste(x, within, sep = sep)
stats::reorder(new_x, by, FUN = fun)
}
scale_x_reordered <- function(..., sep = "___") {
reg <- paste0(sep, ".+$")
ggplot2::scale_x_discrete(labels = function(x) gsub(reg, "", x), ...)
}
plot:
ggplot(dat, aes(reorder_within(cyl, -hwy, manufacturer), y = hwy), hwy) +
geom_col() +
scale_x_reordered() +
facet_wrap(~ manufacturer, scales = "free") +
theme(axis.title=element_blank())
for ascending order you would: reorder_within(cyl, hwy, manufacturer)
Plot without the functions:
ggplot(dat, aes(cyl, y = hwy)) +
geom_col() +
facet_wrap(~ manufacturer, scales = "free") +
theme(axis.title=element_blank())
Upvotes: 6