Developer
Developer

Reputation: 976

How to control colors and breaks in heatmap using ggplot?

I am trying to make a heatmap using ggplot2 package. I have trouble controlling the colors and breaks on the heatmap. I have 18 questions, 22 firms and the meanvalue of the firms responses on a 1 to 5 scale.

Say i would want values (0-1)(1-2)(2-3)(3-4)(4-5) to be color coded. Either with different colors (Blue, Green, Red, Yellow, Purple) or on a gradient scale. And also NA values = Black. Short: How do i choose colors and breaks?

I would also like to fix the order on the axis to "Question1, Question2...Question18". Likewise for the firms. At this moment I believe it is of class "factor" that causes this problem.

> head(mydf, 20)
   Firm   Question             Value
1     1  Question1   3.6675482217047
2     1  Question2  3.74327628361858
3     1  Question3              <NA>
4     1  Question4              <NA>
5     1  Question5              <NA>
6     1  Question6              <NA>
7     1  Question7 0.352078239608802
8     1  Question8  3.04180471049169
9     1  Question9   3.9559090659924
10    1 Question10              <NA>
11    1 Question11                 1
12    1 Question12  4.26591296778731
13    1 Question13  3.95256943635996
14    1 Question14 0.465686274509804
15    1 Question15  2.61764705882353
16    1 Question16  1.83333333333333
17    1 Question17              <NA>
18    1 Question18 0.225490196078431
19    2  Question1  3.85714285714286
20    2  Question2                 4

> ggplot(mydf, aes(Question, Firm, fill=Value)) + geom_tile() + theme(axis.text.x = element_text(angle=330, hjust=0)) 

https://i.sstatic.net/BBb3x.jpg Link to picture of my current plot.

Upvotes: 1

Views: 1179

Answers (1)

Joe
Joe

Reputation: 3991

The root of your problem appears to be that Value is a factor, rather than a numeric vector. I infer this based on the fact that in the head() output NA values are written as <NA>, which I assume is how they were written in your original spreadsheet, but is not default behavior for R. The image you link to is ggplot's default behavior for coloring based on a factor; the default coloration for numeric is much closer to what you want.

You can check if this in indeed the case by using class$mydf$Value. If it is indeed a factor, convert it to numeric with the following:

mydf$Value <-as.numeric(as.character(mydf$Value))

Your plotting code as written will now return a graph which looks like this:enter image description here

You can play around with the exact visualization using the gradient scale, or add a manual scale.

As for your other question, reordering that factor is quite simple. Adapted From R bloggers:

mydf$Question <- factor(mydf$Question, levels(mydf$Question)[c(1,10:18,2:9)])

Upvotes: 1

Related Questions