obrl_soil
obrl_soil

Reputation: 1214

geom_histogram with proportions and factor data

I'm trying to consistently plot histograms for zonal statistics from a thematic map. The data within a single zone often looks something like this:

dat <- data.frame("CLASS" = sample(LETTERS[1:6], 250, replace = TRUE,
                               prob = c(.15, .06, .35, .4, .02, 0)))
dat$CLASS <- factor(dat$CLASS, levels = LETTERS[1:6], ordered = T)

wherein not all possible classes may have been present in the zone.

I can pre-compute the data summary and use geom_bar and a manual colour scale to get consistent bar colours regardless of missing data:

library(dplyr)
library(ggplot2)
library(viridis)

dat_summ <- dat %>%
  group_by(CLASS, .drop = FALSE) %>%
  summarise(percentage = n() / nrow(.) * 100)

mancols <- viridis_pal()(6)
names(mancols) <- LETTERS[1:6]

ggplot(dat_summ) +
  geom_bar(aes(x = CLASS, y = percentage, fill = CLASS), 
           stat = 'identity', show.legend = FALSE) +
  scale_x_discrete(drop = FALSE) +
  scale_fill_manual(values = mancols, drop = FALSE) +
  labs(x = 'Class', y = 'Percent') +
  theme_minimal() +
  theme(panel.grid.minor = element_blank())

geom_bar plot

But I can't keep the colours consistent across plots when I try to use geom_histogram:

ggplot(dat) +
  geom_histogram(aes(x = CLASS,  
                 y = (..count../sum(..count..)) * 100,
                 fill = ..x..), stat = 'count', show.legend = FALSE) +
  scale_x_discrete(drop = FALSE) +
  scale_fill_viridis_c() +
  labs(x = 'Class', y = 'Percent') +
  theme_minimal() +
  theme(panel.grid.minor = element_blank())

enter image description here

If any of the outside-edge columns (A, F) are count = 0, the colours rescale to where data is present. This doesn't happen if there's a gap in one of the middle classes. Using scale_fill_viridis_b() doesn't solve the problem - it always rescales the palette against the number of non-0 columns.

Is it possible to prevent this behaviour and output consistent colours no matter which columns are count = 0, or am I stuck with my geom_bar approach?

Upvotes: 1

Views: 802

Answers (2)

cuttlefish44
cuttlefish44

Reputation: 6786

Maybe scale_fill_discrete/scale_fill_viridis_d(drop = F) is what you want (with fill = CLASS).

ggplot(dat) +
  geom_histogram(aes(x = CLASS,  
                     y = (..count../sum(..count..)) * 100,
                     fill = CLASS), stat = 'count', show.legend = FALSE) +
  scale_x_discrete(drop = FALSE) +
  scale_fill_viridis_d(drop = FALSE) +
  labs(x = 'Class', y = 'Percent') +
  theme_minimal() +
  theme(panel.grid.minor = element_blank())

Upvotes: 2

Paul van Oppen
Paul van Oppen

Reputation: 1495

I think that the problem is that you pass the calculated variable ..x.. to fill in the aesthetics. It appears the length of this variable changes with your data set. You could replace it with scale_fill_manual and you will get the same plot colours regardless of how many levels there are in your CLASS variable:

ggplot(dat) +
  geom_histogram(aes(x = CLASS, y = stat(count/sum(count) * 100), fill = CLASS), stat = 'count', show.legend = FALSE) +
  scale_x_discrete(drop = FALSE) +
  scale_fill_manual(values = c("#FF0000FF", "#CCFF00FF", "#00FF66FF", "#0066FFFF", "#CC00FFFF", "#FF99FFFF")) 
  labs(x = 'Class', y = 'Percent') +
  theme_minimal() +
  theme(panel.grid.minor = element_blank())

Upvotes: 1

Related Questions