Leonardo Viotti
Leonardo Viotti

Reputation: 506

ggplot2 heatmap with tile height and width as aes()

I'm trying to create a heat map for an OD matrix, but I wanted to scale the rows and columns by certain weights. Since these weights are constant across each category I would expect the plot would keep the rows and columns structure.

# Tidy OD matrix
df <- data.frame (origin  = c(rep("A", 3), rep("B", 3),rep("C", 3)),
                  destination = rep(c("A","B","C"),3),
                  value = c(0, 1, 10, 5, 0, 11, 15, 6, 0))

# Weights
wdf <- data.frame(region = c("A","B","C"),
                  w = c(1,2,3))

# Add weights to the data.
plot_df <- df %>% 
  merge(wdf %>% rename(w_origin = w), by.x = 'origin', by.y = 'region') %>% 
  merge(wdf %>% rename(w_destination = w), by.x = 'destination', by.y = 'region')
  

Here's how the data looks like:

> plot_df
  destination origin value w_origin w_destination
1           A      A     0        1             1
2           A      C    15        3             1
3           A      B     5        2             1
4           B      A     1        1             2
5           B      B     0        2             2
6           B      C     6        3             2
7           C      B    11        2             3
8           C      A    10        1             3
9           C      C     0        3             3

However, when passing the weights as width and height in the aes() I get this:

ggplot(plot_df, 
       aes(x = destination, 
           y = origin)) +
  geom_tile(
    aes(
      width = w_destination,
      height = w_origin,
      fill = value),
    color = 'black')

enter image description here

It seems to be working for the size of the columns (width), but not quite because the proportions are not the right. And the rows are all over the place and not aligned.

I'm only using geom_tile because I could pass height and width as aesthetics, but I accept other suggestions.

Upvotes: 1

Views: 2756

Answers (2)

stefan
stefan

Reputation: 125572

The issue is that your tiles are overlapping. The reason is that while you could pass the width and the heights as aesthetics, geom_tile will not adjust the x and y positions of the tiles for you. As your are mapping a discrete variable on x and y your tiles are positioned on a equidistant grid. In your case the tiles are positioned at .5, 1.5 and 2.5. The tiles are then drawn on these positions with the specified width and height.

This could be easily seen by adding some transparency to your plot:

library(ggplot2)
library(dplyr)

ggplot(plot_df, 
       aes(x = destination, 
           y = origin)) +
  geom_tile(
    aes(
      width = w_destination,
      height = w_origin,
      fill = value), color = "black", alpha = .2)

To achieve your desired result you have to manually compute the x and y positions according to the desired widths and heights to prevent the overlapping of the boxes. To this end you could switch to a continuous scale and set the desired breaks and labels via scale_x/y_ continuous:

breaks <- wdf %>% 
  mutate(cumw = cumsum(w),
         pos = .5 * (cumw + lag(cumw, default = 0))) %>% 
  select(region, pos)

plot_df <- plot_df %>% 
  left_join(breaks, by = c("origin" = "region")) %>% 
  rename(y = pos) %>% 
  left_join(breaks, by = c("destination" = "region")) %>% 
  rename(x = pos)

ggplot(plot_df, 
       aes(x = x, 
           y = y)) +
  geom_tile(
    aes(
      width = w_destination,
      height = w_origin,
      fill = value), color = "black") +
  scale_x_continuous(breaks = breaks$pos, labels = breaks$region, expand = c(0, 0.1)) +
  scale_y_continuous(breaks = breaks$pos, labels = breaks$region, expand = c(0, 0.1))

Upvotes: 1

John Hadish
John Hadish

Reputation: 95

So I think I have a partial solution for you. After playing arround with geom_tile, it appears that the order of your dataframe matters when you are using height and width.

Here is some example code I came up with off of yours (run your code first). I converted your data_frame to a tibble (part of dplyr) to make it easier to sort by a column.

# Converted your dataframe to a tibble dataframe
plot_df_tibble = tibble(plot_df)

# Sorted your dataframe by your w_origin column:
plot_df_tibble2 = plot_df_tibble[order(plot_df_tibble$w_origin),]

# Plotted the sorted data frame:
ggplot(plot_df_tibble2, 
       aes(x = destination, 
           y = origin)) +
  geom_tile(
    aes(
      width = w_destination,
      height = w_origin,
      fill = value),
    color = 'black')

And got this plot: Link to image I made

I should note that if you run the converted tibble before you sort that you get the same plot you posted.

It seems like the height and width arguements may not be fully developed for this portion of geom_tile, as I feel that the order of the df should not matter. Cheers

Upvotes: 1

Related Questions