epi_n00b
epi_n00b

Reputation: 150

Create a different color scale for each bar in a ggplot2 stacked bar graph

I have a stacked bar chart that looks like this:

Number of patients on each drug by drug class

While the colors look nice, it is confusing to have so many similar colors representing different drugs. I would like to have a separate color palette for each bar in the graph, for example, class1 could be use the palette "Blues" while class2 could use the palette "BuGn" (color palette names found here)

I have found some instances in which people manually coded colors for each bar (such as here), but I'm not sure if what I'm asking is possible - these bars would need to be based on palettes, since there are so many drugs in each drug class.

Code to create the above graph:

library(ggplot2)
library(plyr)
library(RColorBrewer)

drug_name <- c("a", "a", "b", "b", "b", "c", "d", "e", "e", "e", "e", "e", "e",
           "f", "f", "g", "g", "g", "g", "h", "i", "j", "j", "j", "k", "k",
           "k", "k", "k", "k", "l", "l", "m", "m", "m", "n", "o")
df <- data.frame(drug_name)

#get the frequency of each drug name
df_count <- count(df, 'drug_name')

#add a column that specifies the drug class
df_count$drug_class <- vector(mode='character', length=nrow(df_count))

df_count$drug_class[df_count$drug_name %in% c("a", "c", "e", "f")] <- 'class1'

df_count$drug_class[df_count$drug_name %in% c("b", "o")] <- 'class2'

df_count$drug_class[df_count$drug_name %in% c("d", "h", "i")] <- 'class3'

df_count$drug_class[df_count$drug_name %in% c("g", "j", "k", "l", "m", "n")] <- 'class4'

#expand color palette (from http://novyden.blogspot.com/2013/09/how-to-expand-color-palette-with-ggplot.html)

colorCount = length(unique(df_count$drug_name))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))

test_plot <- ggplot(data = df_count, aes(x=drug_class, y=freq, fill=drug_name) ) + geom_bar(stat="identity") + scale_fill_manual(values=getPalette(colorCount))

test_plot

Upvotes: 6

Views: 8457

Answers (2)

danny_C_O_T_W
danny_C_O_T_W

Reputation: 228

The various color palettes above do not consistently transfer to the different classes - instead they plot according to the named vector (a,b,c...) and thus are split across the various classes. See ??scale_fill_manual for details.

In order to "match" them to each set of bars, we need to order the data.frame by class, and align the color palettes appropriately with the names.

Create repeating palettes to test correct (expected) ordering.

 repeating.pal = mapply(function(x,y) brewer.pal(x,y), ncol,        c("Set2","Set2","Set2","Set2"))

 repeating.pal[[2]] = repeating.pal[[2]][1:2]  # We only need 2 colors but brewer.pal creates 3 minimum

 repeating.pal = unname(unlist(repeating.pal))

Sort the data according to class (the order we want the colors to remain in!)

 df_count_sorted <- df_count[order(df_count$drug_class),]

Copy the original ordering of the drug names.

 df_count_sorted$labOrder <- df_count$drug_name

Add in test color palette.

 df_count$colours<-repeating.pal

Alter the plot routine, with fill = labOrder.

ggplot(data = df_sorted, aes(x=drug_class, y=freq, fill=labOrder) ) + 
geom_bar(stat="identity", colour="black", lwd=0.2) + 
geom_text(aes(label=paste0(drug_name,": ", freq), y=cum.freq),     colour="grey20") +
scale_fill_manual(values=df_sorted$colours) +
guides(fill=FALSE)

Palette follows expected order

Upvotes: 4

eipi10
eipi10

Reputation: 93851

With so many colors, your plot is going to be confusing. It's probably better to just label each bar section with the drug name and the count. The code below shows one way to make separate palettes for each bar and also how to label the bars.

First, add a column that we'll use for positioning the bar labels:

library(dplyr) # for the chaining (%>%) operator

## Add a column for positioning drug labels on graph
df_count = df_count %>% group_by(drug_class) %>%
  mutate(cum.freq = cumsum(freq) - 0.5*freq)

Second, create the palettes. The code below uses four different Colorbrewer palettes, but you can use any combination of palette-creating functions or methods to control the colors as finely as you wish.

## Create separate palette for each drug class

# Count the number of colors we'll need for each bar
ncol = table(df_count$drug_class)

# Make the palettes
pal = mapply(function(x,y) brewer.pal(x,y), ncol, c("BrBG","OrRd","YlGn","Set2"))
pal[[2]] = pal[[2]][1:2]  # We only need 2 colors but brewer.pal creates 3 minimum
pal = unname(unlist(pal)) # Combine palettes into single vector of colors

ggplot(data = df_count, aes(x=drug_class, y=freq, fill=drug_name) ) + 
  geom_bar(stat="identity", colour="black", lwd=0.2) + 
  geom_text(aes(label=paste0(drug_name,": ", freq), y=cum.freq), colour="grey20") +
  scale_fill_manual(values=pal) +
  guides(fill=FALSE)

enter image description here

There are many strategies and functions for creating color palettes. Here's another method, using the hcl function:

lum = seq(100, 50, length.out=4)    # Vary the luminance for each bar
shift = seq(20, 60, length.out=4)  # Shift the hues for each bar

pal2 = mapply(function(n, l, s) hcl(seq(0 + s, 360 + s, length.out=n+1)[1:n], 100, l), 
              ncol, lum, shift)
pal2 = unname(unlist(pal2))

Upvotes: 7

Related Questions