Two colors ordered heatmap

Question

In this example R heatmap ggplot2 ordered as data file

I've changed data input to this one where for the second column every value is 0 so it should we filled in white

people,1,2,3
Ej1,1,0,0
Ej2,0,0,1
Ej3,0,0,1
Ej4,1,0,0

Using the same code as in the post

library(reshape2)
 library(ggplot2)
 library(scales)
library(plyr)
data <- read.csv("fruits.txt", head=TRUE, sep=",")
data$people <- factor(data$people,levels=rev(data$people))
 data.m = melt(data)
 data.m <- ddply(data.m, .(variable), transform, rescale = rescale(value))
  data.m <- ddply(data.m, .(variable), transform, rescale = rescale(value))
 p <- ggplot(data.m, aes(variable, people)) + geom_tile(aes(fill = rescale), 
                                                   colour =   "white") 
 p + scale_fill_gradient(low = "white", high = "steelblue")

But the X2 variable column is filled with a different color instead of white as shown in the image.

I've been trying to change the scale_fill_gradient(low = "white", high = "steelblue") to the scale_color_gradient but couldn't find out how.

example plot

Rusan Kax · Accepted Answer

The problem appears to be how rescale is being applied to your data frame. For a vector x, rescale(x) checks if the range of x is 0, and if this is the case it will use the mean of the to range, whose default value is to=c(0,1):

rescale(rep(0,4),to=c(0,1))
[1] 0.5 0.5 0.5 0.5

When ddply applies the .FUN rescale to your data frame, when it considers variable X2, the range of value is 0, as in case above.

The rescale column in data.m is showing a value of 0.5 for X2, explained by the above. So the ggplot is plotting the data correctly.

   people variable value rescale
1     Ej1       X1     1     1.0
2     Ej2       X1     0     0.0
3     Ej3       X1     0     0.0
4     Ej4       X1     1     1.0
5     Ej1       X2     0     0.5
6     Ej2       X2     0     0.5
7     Ej3       X2     0     0.5
8     Ej4       X2     0     0.5
9     Ej1       X3     0     0.0
10    Ej2       X3     1     1.0
11    Ej3       X3     1     1.0
12    Ej4       X3     0     0.0

One way around this is to dump the use of ddply here, and just operate on the data frame directly, forcing rescale to operate on the entire value column (rather than via .(variables)) and so avoiding the 0 range problem for X2.

library(reshape2)
library(ggplot2)
library(scales)
library(plyr)
data <- read.csv("fruits.txt", head=TRUE, sep=",")
data$people <- factor(data$people,levels=rev(data$people))
data.m = melt(data)
#data.m <- ddply(data.m, .(variable), transform, rescale = rescale(value))
data.m[,"rescale"]<-rescale(data.m[,"value"],to=c(0,1))
p <- ggplot(data.m, aes(variable, people)) + geom_tile(aes(fill = rescale), colour =   "white")+ scale_fill_gradient(low = "white", high = "steelblue") 
p

ggplot of data.m based on fruit.txt data

Two colors ordered heatmap

Answers (1)

Related Questions