user1471980
user1471980

Reputation: 10626

increasing the legend items in ggplot2

I have thid data frame:

head(x)

      Date Company   Region Units
1 1/1/2012 Gateway  America     0
2 1/1/2012 Gateway   Europe     0
3 1/1/2012 Gateway  America     0
4 1/1/2012 Gateway Americas     0
5 1/1/2012 Gateway   Europe     0
6 1/1/2012 Gateway  Pacific     0

x dput(x)

    structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1/1/2012", 
"1/12/2012", "1/2/2012"), class = "factor"), Company = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("Gateway", "HP", "IBM"), class = "factor"), 
    Region = structure(c(1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 
    1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 
    2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 
    3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 
    4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 
    2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 2L
    ), .Label = c("America", "Americas", "Europe", "Pacific"), class = "factor"), 
    Units = c(1L, 3L, 1L, 6L, 20L, 2L, 2L, 10L, 2L, 1L, 2L, 4L, 
    6L, 30L, 2L, 15L, 10L, 3L, 4L, 7L, 9L, 12L, 34L, 50L, 3L, 
    2L, 4L, 3L, 1L, 3L, 3L, 1L, 4L, 0L, 1L, 0L, 0L, 1L, 0L, 4L, 
    0L, 0L, 0L, 0L, 5L, 0L, 8L, 0L, 0L, 0L, 0L, 0L, 9L, 0L, 56L, 
    10L, 0L, 0L, 5L, 7L, 0L, 0L, 8L, 0L, 2L, 0L, 4L, 0L, 5L, 
    7L, 0L, 0L, 8L, 10L, 0L, 6L, 0L, 4L, 4L, 0L, 2L, 0L, 5L, 
    0L)), .Names = c("Date", "Company", "Region", "Units"), class = "data.frame", row.names = c(NA, 
-84L))

I would like to create a heat map:

ggplot(x, aes(Date, Company, fill=Units)) + geom_tile(aes(fill=Units)) + facet_grid(~Region) + scale_fill_gradient(low="white", high="red")

This command works but I need to be able to use different colors rather than white and red and increase the scalse on the legend. Right now, default is, there are 5 legends. I like to increase that 10. O would be white and others should be distinctly different from white so that users will notice it.

How would I increase the number of legend values using ggplot and assign different color to each legend?

Upvotes: 2

Views: 251

Answers (1)

Arun
Arun

Reputation: 118799

I find it very informative to use quantiles to plot heatmaps as done here in this blog. This helps to generate skewed color sets (as shown in the blog). Suppose that the data is like yours (quite high amount of 0's), then by calculating appropriate quantiles, we could create a skewed color-map, which with appropriate labels, would be visually excellent and informative. I've modified the code from the blog plot already linked for this problem and added a bit more explanation. The blog post must get all the credit for the idea and implementation.

Before going into the code, we'll have to do some analysis with quantiles of your data to see which quantiles to use. By doing:

quantile(x$Units, seq(0, 1, length.out = 25)

#      0% 4.166667% 8.333333%     12.5% 16.66667% 20.83333%       25% 29.16667% 33.33333% 
# 0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000 
#   37.5% 41.66667% 45.83333%       50% 54.16667% 58.33333%     62.5% 66.66667% 70.83333% 
# 1.00000   1.00000   2.00000   2.00000   3.00000   3.00000   4.00000   4.00000   5.00000 
#     75% 79.16667% 83.33333%     87.5% 91.66667% 95.83333%      100% 
# 6.00000   7.00000   8.00000   9.62500  10.16667  25.41667  56.00000 

You see that the 0% quantile corresponds to your data Units=0. And it is as such until 33% (33.33% to be precise). So, maybe we choose 38% as the next quantile. Then say, 60%, 75%, 90% and finally finish with 100%. Now, we have enough levels you've wanted and they are at levels that make sense for your data.

we'll need zoo package to accomplish this. Let's construct the data now:

require(zoo) # for rollapply
# the quantiles we just decided to categorise the data into classes.
qtiles    <- quantile(x$Units, probs = c(0, 38, 60, 75, 90, 100)/100)
# a color palette
c_pal     <- colorRampPalette(c("#3794bf", "#FFFFFF", 
                         "#df8640"))(length(qtiles)-1)
# since we are using quantile classes for fill levels, 
# we'll have to generate the appropriate labels
labels    <- rollapply(round(qtiles, 2), width = 2, by = 1, 
                      FUN = function(i) paste(i, collapse = " : "))
# added the quantile interval in which the data falls, 
# which will be used for fill
x$q.units <- findInterval(x$Units, qtiles, all.inside = TRUE)

# Now plot
library(ggplot2)
p <- ggplot(data = x, aes(x = Date, y = Company, fill = factor(q.units)))
p <- p + geom_tile(color = "black")
p <- p + scale_fill_manual(values = c_pal, name = "", labels = labels)
p <- p + facet_grid( ~ Region)
p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1))
p

You get this: ggplot2_heatmap_skewed

Hope this helps.

Edit: You can also visit colorbrewer2.org to get nice palettes and set the colors yourself. For example:

# try out these colors:
c_pal     <- c("#EDF8FB", "#B3CDE3", "#8C96C6", "#8856A7", "#810F7C")
c_pal     <- c("#FFFFB2", "#FECC5C", "#FD8D3C", "#F03B20", "#BD0026")

Also, try setting alpha in the code geom_tile(color = "black", alpha = 0.5").

Upvotes: 3

Related Questions