pomegranate
pomegranate

Reputation: 765

Basic questions about using ggplot2 to make heatmaps

I'm trying to learn how to generate heat maps in R, so sorry if these questions seem really basic. Let's say I have this table (a bit contrived, but I'm just trying to practice here):

    NumHours FavePet FaveFood
1        3     Cat    Burger
2        2     Cat    Pizza
3        5    Fish    Pizza
4        2     Dog    Pizza
5        4    Fish    Apple
6        3     Dog    Burger
7        3     Cat    Pizza
8        1     Cat    Burger
9        6     Dog    Apple

The dput structure is below:

structure(list(NumHours = c(3L, 2L, 5L,2L, 4L, 3L, 3L, 1L, 6L), 
FavePet = structure(c(2L, 2L, 3L, 1L, 3L, 1L, 2L, 2L, 1L), 
.Label = c("Dog",  "Cat", "Fish"), class = "factor"), 
FaveFood = structure(c(3L, 2L, 2L, 2L, 1L, 3L, 2L, 3L, 1L), 
.Label = c("Apple", "Pizza", "Burger"), class = "factor")), 
.Names = c("NumHours", "FavePet", "FaveFood"), row.names = c(NA, 9L), class = "data.frame")

I'd like to generate a heat map where FaveFood is on the x-axis, FavePet is on the y-axis, and the average number of hours for the pair is the intensity of the color. For example, since there are two "Cat Pizza" values (2, 3), then a color corresponding to 2.5 would be plotted, and this would be lighter than the value of Dog Apple, which has a value of 6.

So far, I have the following, which creates the correct structure, but doesn't incorporate averages (not sure where to put it... it's probably something like fun.y = mean, but I'm not applying it to y or x, so I don't know how to call it).

ggplot(df, aes(x=FaveFood, y=FavePet, fill=as.factor(NumHours))) + geom_tile(aes(color="white"))

I'd also like the colors to range from yellow to red, based on the value so I added

+ scale_fill_gradient(low="yellow", high="red")

But this leads to this error, which I'm not sure how to fix.

Error: Discrete value supplied to continuous scale

Your help is really appreciated! I'd like to learn how to do this properly :)

Upvotes: 1

Views: 138

Answers (2)

spsaaibi
spsaaibi

Reputation: 452

First, you could use the mutate function inside dplyr to generate a new variable, called AvgHours, which computes the mean of pairs of FavePet and FaveFood.

df <- df %>% group_by(FavePet,FaveFood) %>% mutate(AvgHours = mean(NumHours))

Then you can use ggplot's geom_tile to plot the desired heatmap.

ggplot(df, aes(FaveFood,FavePet)) + geom_tile(aes(fill = AvgHours)) + scale_fill_gradient(low = "yellow", high = "red")

Upvotes: 0

MichaelVE
MichaelVE

Reputation: 1344

Try a basic heatmap like:

ggplot(df, aes(FaveFood, FavePet)) + 
  geom_tile(aes(fill = NumHours),  colour = "black") + 
  scale_fill_gradient(name = "NumHours", low = "yellow",  high = "red") +
  labs(title = "Heatmap FaveFood and FavePet")+
  labs(x = "FaveFood", y = "FavePet")

There is a reason that you get the error:

Error: Discrete value supplied to continuous scale

This is because you try to make a gradient with your scale_fill_gradient. However, you just made a factor out of your numeric values with fill=as.factor(NumHours). R cannot make a gradient out of a factor so that is were it went wrong.

Good luck!

Upvotes: 1

Related Questions