PinkyL
PinkyL

Reputation: 361

dynamic conditional colors ggplot for geom_bar

I have data that varies for different companies who may have different numbers of relevant "measures". If a measure falls below the benchmark, it should be colored a certain color which I've set to pink. If a measure is above the benchmark, it should be colored blue. The problem is, different companies have different numbers of measures and these measures could be lower or higher than the benchmark; there is no pattern.

I am using this condition in fill and it works sometimes.

ggplot(df, aes(measure)) + geom_col(aes(y=company, fill=overall > company)) + geom_point(aes(y=overall, color="overall"), size=8, shape=124) +
  scale_color_manual("",values=c("company" = "yellow", "overall"="blue"),labels=c("company" = "Your Company", "overall"= "Overall Benchmark")) +
  coord_flip()+ guides(size=FALSE) + theme(legend.box="horizontal",legend.key=element_blank(), legend.title=element_blank(),legend.position="top") +
  scale_fill_manual(values=c("lightblue2", "lightpink2"),labels=c("Better","Worse"))

But for example if the data frame looks like this, it's completely off:

 df = data.frame(
      measure = c("Measure A","Measure B","Measure C","Measure D"),
      overall = c(9, 5, 11, 19),
      company = c(4,3,7, 16)
    )

enter image description here

If the data frame looks like this, it's fine:

df2 = data.frame(
  measure = c("Measure A","Measure B", "Measure C"),
  overall = c(9, 5, 11),
  company = c(11,7, 9)
)

enter image description here

I think this method doesn't accurately color the bars but I'm not sure why exactly.

Upvotes: 0

Views: 2653

Answers (1)

Z.Lin
Z.Lin

Reputation: 29125

Try the following instead:

library(dplyr)

ggplot(df %>%
         mutate(fill = ifelse(overall > company, "Worse", "Better")), aes(measure)) + 
  geom_col(aes(y=company, fill=fill)) + 
  geom_point(aes(y=overall, color="overall"), size=8, shape=124) +
  coord_flip()+ guides(size=FALSE) + 
  theme(legend.box="horizontal",legend.key=element_blank(), 
        legend.title=element_blank(),legend.position="top") +
  scale_fill_manual(values=c("Better" = "lightblue2", "Worse" = "lightpink2"))

Explanation: Without specifying the fill colour that's associated with each value, you'll run into this problem when you have different fill values.

In your second case, overall > company evaluates to c(FALSE, TRUE, TRUE) for the 3 measures. The first unique value (FALSE) gets mapped to light blue / "Better", while the second (TRUE) gets mapped to light pink / "Worse".

In your first case, overall > company evaluates to c(TRUE, TRUE, TRUE), so it is TRUE that gets mapped to light blue / "Better", because light blue / "Better" comes first sequentially. Nothing maps to light pink / "Worse" because there's only one fill value.

This version creates a fill variable explicitly in the source data, with the labels "Better" / "Worse", & uses a named vector in scale_fill_manual to associate each label with the appropriate colour. It will work with both cases in your example.

Upvotes: 2

Related Questions