wallisthedog
wallisthedog

Reputation: 1

NA value breaks ggplot2 heatmap?

I'm using ggplot2 to generate a heatmap, but NA values cause the heatmap to be all one color.

Example dataframe:

id<-as.factor(c(1:5))
year<-as.factor(c("Y13", "Y14", "Y15"))
freq<-c(26, 137, 166, 194, 126, 8, 4, 76, 20, 92, 4, NA, 6, 6, 17)
test<-data.frame(id, year, freq)

  test

  id year freq
  1  Y13   26
  2  Y14  137
  3  Y15  166
  4  Y13  194
  5  Y14  126
  1  Y15    8
  2  Y13    4
  3  Y14   76
  4  Y15   20
  5  Y13   92
  1  Y14    4
  2  Y15   NA
  3  Y13    6
  4  Y14    6
  5  Y15   17

I used the following for the heatmap:

# set color palette
jBuPuFun <- colorRampPalette(brewer.pal(n = 9, "RdBu"))
paletteSize <- 256
jBuPuPalette <- jBuPuFun(paletteSize)

# heatmap

ggplot(test, aes(x = year, y = id, fill = freq)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  geom_tile() +
  scale_fill_gradient2(low = jBuPuPalette[1],
                       mid = jBuPuPalette[paletteSize/2],
                       high = jBuPuPalette[paletteSize],
                       midpoint = (max(test$freq) + min(test$freq)) / 2,
                       name = "Number of Violations")

The result is a gray color over the entire heatmap.

When I removed the "NA" from the dataframe, the heatmap renders correctly.

I've experimented with this by specifically assigning color to th "NA" values (for example, by

scale_fill_gradient2(low = jBuPuPalette[1],
                       mid = jBuPuPalette[paletteSize/2],
                       high = jBuPuPalette[paletteSize],
                       na.value="yellow",
                       midpoint = (max(test$freq) + min(test$freq)) / 2,
                       name = "Number of Violations")

However, that just made the entire heatmap yellow.

Am I missing something obvious? Any suggestions are appreciated.

Thanks.

Upvotes: 0

Views: 1582

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145975

Comment to answer:

ggplot deals with NAs just fine, but the defaults for min and max are to return NA if the vector contains any NA. You just need to set na.rm = TRUE for these when you define the midpoint of your scale:

midpoint = (max(test$freq, na.rm = TRUE ) + min(test$freq, na.rm = TRUE)) / 2,

Upvotes: 1

Related Questions