Reputation: 1214
I'm trying to consistently plot histograms for zonal statistics from a thematic map. The data within a single zone often looks something like this:
dat <- data.frame("CLASS" = sample(LETTERS[1:6], 250, replace = TRUE,
prob = c(.15, .06, .35, .4, .02, 0)))
dat$CLASS <- factor(dat$CLASS, levels = LETTERS[1:6], ordered = T)
wherein not all possible classes may have been present in the zone.
I can pre-compute the data summary and use geom_bar
and a manual colour scale to get consistent bar colours regardless of missing data:
library(dplyr)
library(ggplot2)
library(viridis)
dat_summ <- dat %>%
group_by(CLASS, .drop = FALSE) %>%
summarise(percentage = n() / nrow(.) * 100)
mancols <- viridis_pal()(6)
names(mancols) <- LETTERS[1:6]
ggplot(dat_summ) +
geom_bar(aes(x = CLASS, y = percentage, fill = CLASS),
stat = 'identity', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = mancols, drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
But I can't keep the colours consistent across plots when I try to use geom_histogram
:
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = ..x..), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_c() +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
If any of the outside-edge columns (A, F) are count = 0, the colours rescale to where data is present. This doesn't happen if there's a gap in one of the middle classes. Using scale_fill_viridis_b()
doesn't solve the problem - it always rescales the palette against the number of non-0 columns.
Is it possible to prevent this behaviour and output consistent colours no matter which columns are count = 0, or am I stuck with my geom_bar
approach?
Upvotes: 1
Views: 802
Reputation: 6786
Maybe scale_fill_discrete/scale_fill_viridis_d(drop = F)
is what you want (with fill = CLASS
).
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_d(drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
Upvotes: 2
Reputation: 1495
I think that the problem is that you pass the calculated variable ..x..
to fill
in the aesthetics. It appears the length of this variable changes with your data set. You could replace it with scale_fill_manual
and you will get the same plot colours regardless of how many levels there are in your CLASS variable:
ggplot(dat) +
geom_histogram(aes(x = CLASS, y = stat(count/sum(count) * 100), fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = c("#FF0000FF", "#CCFF00FF", "#00FF66FF", "#0066FFFF", "#CC00FFFF", "#FF99FFFF"))
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
Upvotes: 1