Reputation: 109
I'm working with the "NYC Property Sales" dataset which is available on kaggle: https://www.kaggle.com/new-york-city/nyc-property-sales?select=nyc-rolling-sales.csv
After cleaning the dataset, I produced the following barplot with this code:
nyc_clean %>%
filter(year == 2017,
borough == "Manhatten") %>%
add_count(neighborhood) %>%
mutate(neighborhood = fct_reorder(neighborhood, n) %>% fct_rev()) %>%
filter(as.numeric(neighborhood) <= 13) %>%
distinct(borough, block, lot, .keep_all = TRUE) %>%
pivot_longer(c("residential_units", "commercial_units"),
names_to = "type",
values_to = "count") %>%
mutate(neighborhood = fct_reorder(neighborhood, as.numeric(as.factor(type)),
mean, na.rm = TRUE)) %>%
ggplot(aes(neighborhood, count, fill = type)) +
geom_col(position = "fill") +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_light()
I want to reorder the barplot so that the proportion of residential units is in a descending order (from top to bottom). In the code above, I tried to reorder the neighborhoods with fct_reorder
but it doesn't have any effect on the plot.
As a reproducible example, consider this dataset:
df <- tibble(neighborhood = c(rep("Chelsea", 4), rep("Tribeca", 4),
rep("Flatiron", 4)),
type = c("residential_unit", "commercial_unit", "residential_unit",
"commercial_unit", "residential_unit", "commercial_unit",
"residential_unit", "commercial_unit", "residential_unit",
"commercial_unit", "residential_unit", "commercial_unit"),
count = c(8, 3, 9, 1, 5, 4, 6, 3, 12, 2, 10, 1))
When trying to reorder the plot, the bars are ordered equally messy as in my output above:
df %>%
mutate(neighborhood = fct_reorder(neighborhood, as.numeric(as.factor(type)),
mean, na.rm = TRUE)) %>%
ggplot(aes(neighborhood, count, fill = type)) +
geom_col(position = "fill") +
scale_y_continuous(labels = scales::percent) +
coord_flip() +
theme_light()
Any ideas on what I'm missing here?
Upvotes: 1
Views: 3241
Reputation: 67010
Hopefully this makes up for lack of concision with clarity:
df %>%
left_join( # Add res_share for each neighborhood
df %>%
mutate(share = count / sum(count), .by = neighborhood) %>%
filter(type == "residential_unit") %>%
select(neighborhood, res_share = share)
) %>%
mutate(neighborhood = fct_reorder(neighborhood, res_share)) %>%
ggplot(aes(neighborhood, count, fill = type)) +
geom_col(position = "fill") +
scale_y_continuous(labels = scales::percent) +
coord_flip() +
theme_light()
(Edited in 2024 to use the dplyr 1.1.0+ .by
syntax, which is cleaner than the group_by(neighborhood) %>% ... %>% ungroup()
syntax I had used originally.
Upvotes: 3
Reputation: 389325
One way would be to arrange the data by 'residential_unit'
and count
and assign factor levels in the order they appear.
library(dplyr)
library(ggplot2)
df %>%
group_by(neighborhood, type) %>%
summarise(prop = sum(count)) %>%
mutate(prop = prop.table(prop)) %>%
arrange(type != 'residential_unit', prop) %>%
pull(neighborhood) %>% unique -> levels
df %>%
mutate(neighborhood = factor(neighborhood, levels)) %>%
ggplot(aes(neighborhood, count, fill = type)) +
geom_col(position = "fill") +
scale_y_continuous(labels = scales::percent) +
coord_flip() +
theme_light()
Upvotes: 2