Reputation: 3
I have a dataframe containing numerical (percentages) and categorical variables. I'd like to produce a stacked barplot (using ggplot2) with the colums (categorical variables) sorted by the numerical variable.
I tried this:
How to control ordering of stacked bar chart using identity on ggplot2
and this:
https://community.rstudio.com/t/a-tidy-way-to-order-stacked-bar-chart-by-fill-subset/5134
but I am not familiar with factors and I'd like to understand more.
# Reproduce a dummy dataset
perc <- c(11.89, 88.11, 2.56, 97.44, 5.96, 94.04, 6.74, 93.26)
names <- c('A', 'A', 'B', 'B', 'C', 'C', 'D', 'D')
df <- data.frame(class = rep(c(-1, 1), 4),
percentage = perc,
name = names)
# Plot
ggplot(df, aes(x = factor(name), y = percentage, fill = factor(class))) +
geom_bar(stat = "identity") +
scale_fill_discrete(name = "Class") +
xlab('Names')
This code produces a plot whose bars are ordered by the variable "names". I'd like to order it by the variable "percentage". Even if I manually order the dataframe, the resulting plot is the same.
Upvotes: 0
Views: 785
Reputation: 36
Changing the levels before plotting will do it for you.
lvlorder = order((df[df$class==-1,])$percentage, decreasing = T)
df$name = factor(df$name, levels = levels(df$name)[lvlorder])
Upvotes: 0
Reputation: 545608
The issue here is that all your percentages for a given category (name
) in fact add up to 100%. So sorting by percentage, which is normally achieved via aes(x = reorder(name, percentage), y = percentage)
, won’t work here.
Instead, you probably want to order by the percentage of the data that has class = 1 (or class = -1). Doing this requires some trickery: Use ifelse
to select the percentage for the rows where class == 1
. For all other rows, select the value 0:
ggplot(df, aes(x = reorder(name, ifelse(class == 1, percentage, 0)), y = percentage, fill = factor(class))) +
geom_bar(stat = "identity") +
scale_fill_discrete(name = "Class") +
xlab('Names')
You might want to execute just the reorder
instruction to see what’s going on:
reorder(df$name, ifelse(df$class == 1, df$percentage, 0))
# [1] A A B B C C D D
# attr(,"scores")
# A B C D
# 44.055 48.720 47.020 46.630
# Levels: A D C B
As you can see, your names got reordered based on the mean percentage for each category (by default, reorder
uses the mean; see its manual page for more details). But the “mean” we calculated was between each name’s percentage for class = 1, and the value 0 (for class ≠ 1).
Upvotes: 1
Reputation: 4592
It is similar to Konrad Rudolph, I have just created a factor level and use it to reorder. Here is my solution:
x_order <- with(subset(df, class == -1), reorder(name, percentage))
df$name <- factor(df$name, levels = levels(x_order))
ggplot(df, aes(x = name, y = percentage, fill = factor(class))) +
geom_bar(stat = "identity") +
scale_x_discrete(breaks = levels(x_order))
Upvotes: 0