Arne Brandschwede
Arne Brandschwede

Reputation: 85

Heatmap in R with ggplot2

I have a data frame T_mod with 150 observations and 2920 variables, containing subsurface temperature values in °C over one year. It looks like this:

> T_mod[1:10, 1:6]
      t=-24548400 t=-24537600 t=-24526800 t=-24516000 t=-24505200 t=-24494400
z=0.1    9.000187    9.004622    9.009004    9.013332    9.017607    9.021829
z=0.2    8.587763    8.592795    8.597776    8.602705    8.607583    8.612410
z=0.3    8.179728    8.185313    8.190848    8.196334    8.201770    8.207157
z=0.4    7.776561    7.782655    7.788702    7.794702    7.800653    7.806558
z=0.5    7.378704    7.385267    7.391785    7.398256    7.404682    7.411062
z=0.6    6.986564    6.993556    7.000504    7.007408    7.014268    7.021084
z=0.7    6.600512    6.607894    6.615235    6.622533    6.629789    6.637003
z=0.8    6.220886    6.228623    6.236319    6.243975    6.251591    6.259166
z=0.9    5.847995    5.856050    5.864068    5.872046    5.879986    5.887887
z=1      5.482113    5.490454    5.498759    5.507026    5.515257    5.523450

The rownames stand for depth. In 10 cm increments from 0.1 m to 15 m underground. Colnames indicate time in elapsed seconds. The cell values are temperatures in °C, for each point in time for a given depth.

I want to create a heatmap showing temperatures along time on the x-axis and depth on the y-axis. The plot below is created with the image.plot function in R base graphics using the following code:

image.plot(z = t(as.matrix(T_mod[150:1,])), legend.lab = "Temperature (°C)",
           ylab = "Depth (m)", xlab = "Time")

The x axis represents time (one year in 3h intervals) and the y axis represents depth (0 to 15 m in 10 cm increments). Z values are temperatures for a given point in time and a specific depth. Obvisously, the axes ticks and tick labels make little sense as of now. The problem is the image and image.plot functions are somewhat rigid, not allowing to adjust axis ticks, labels, etc.

Now, someone has pointed me towards ggplot2 for greater flexibility in adjusting plot parameters but I have not used ggplot so far. Consequently, the code below does not work.

ggplot(T_mod, aes(x=time, y=Depth, z=Temperature)) +
  geom_tile(aes(fill=Temperature)) +
  theme(panel.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"),
        panel.border = element_rect(colour = "black", fill=NA, size=2))+
  ylab("Depth")+
  xlab("Time")+
        # possibly use stat_contour(binwidth = 0.1,aes(colour = ..level..),size=0.1) +
        # ... and scale_fill_gradient(low = "red", high = "Green”) +
        # ... and scale_colour_gradient(low = "black", high = "black",guide = "none")+
  scale_y_continuous(expand = c(0,0),breaks=seq(20, 140, 20),limits=c(20,140),labels=lbl_y)+ 
  scale_x_continuous(expand = c(0,0),breaks=seq(124, 2796, 240),limits=c(124,2796),labels=lbl_x)+
  coord_cartesian(ylim=c(1,150),xlim=c(1,2920))+
  theme(axis.text.x = element_text(size = 15),axis.text.y = element_text(size = 15),axis.title.x = element_text(size = 15),axis.title.y = element_text(size = 15),plot.title = element_text(size=15))+
  ggtitle("Main title")

> lbl_y
[1]  -2  -4  -6  -8 -10 -12 -14
> lbl_x
 [1] "01 Sep" "01 Okt" "01 Nov" "01 Dez" "01 Jan" "01 Feb" "01 Mrz" "01 Apr" "01 Mai"
[10] "01 Jun" "01 Jul" "01 Aug"

The basic issue I believe is that I do not know how to assign depth, time, and temperature from the data frame to the aes() call in the first row. Other examples use columns to specify that but the columns in my data frame indicate temperatures at one point in time and as infill I want all temperatures plotted. Any sugggestions on how to plot this with ggplot2 or how to make changes to the image.plot function above that allow axes to be set are greatly appreciated.

Upvotes: 1

Views: 7452

Answers (2)

Claus Wilke
Claus Wilke

Reputation: 17810

I agree with the other statements that the data need to be reshaped to be in tidy format. I just wanted to add that geom_raster() rather than geom_tile() is generally the better option for large heatmaps. It is optimized for large raster datasets and it is way faster. Example follows below (using the built-in volcano data, since I don't have your dataset).

library(ggplot2)
library(viridis)

# create tidy version of volcano data
nx = 87
ny = 61
volcano_data <- data.frame(height = c(volcano), x = rep(1:nx, ny), y = rep(1:ny, each = nx))

# take a look at the dataset. it's indeed tidy.
head(volcano_data)
#   height x y
# 1    100 1 1
# 2    101 2 1
# 3    102 3 1
# 4    103 4 1
# 5    104 5 1
# 6    105 6 1

# plot
ggplot(volcano_data, aes(x, y, fill=height)) + 
  geom_raster() + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis()

enter image description here

geom_raster() also allows you to interpolate between adjacent colors for a smoother appearance. This may or may not be useful to you:

ggplot(volcano_data, aes(x, y, fill=height)) + 
  geom_raster(interpolate = TRUE) + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis()

enter image description here

Upvotes: 2

Calum You
Calum You

Reputation: 15072

I mentioned in the comment that I think you needed to gather your data, at least if it was presented as shown with time in columns and depth in rows. ggplot2 is designed to work with tidy data, where each row is an observation and variables are stored in columns. Here, that means you want just three columns, one for each of depth, temp and time, and each row is then a single measurement. You can do this with the code below.

  1. Use gather to combine all the time columns into a single one
  2. Use separate to split up the time and row values into just the numeric part
  3. Use select to drop unneeded variables
  4. Use mutate_at to convert the values stored as strings into numbers

Then, ggplot becomes easy to use. geom_tile is designed for three main aesthetics, x, y, and fill. We just call geom_tile and map its aesthetics to the variables we want, and produce the plot below. I include scale_fill_viridis which changes the colours to perceptually uniform ones, but that isn't necessary. You might not need all these steps if your data isn't stored exactly as shown.

As far as the axis ticks go, you probably do want scale_x_continuous but I am not sure what units your time values are in right now.

For more info on tidy data and on ggplot, try these chapters.

library(tidyverse)
library(viridis)
tbl <- read_table2(
  "depth   t=-24548400 t=-24537600 t=-24526800 t=-24516000 t=-24505200 t=-24494400
  z=0.1    9.000187    9.004622    9.009004    9.013332    9.017607    9.021829
  z=0.2    8.587763    8.592795    8.597776    8.602705    8.607583    8.612410
  z=0.3    8.179728    8.185313    8.190848    8.196334    8.201770    8.207157
  z=0.4    7.776561    7.782655    7.788702    7.794702    7.800653    7.806558
  z=0.5    7.378704    7.385267    7.391785    7.398256    7.404682    7.411062
  z=0.6    6.986564    6.993556    7.000504    7.007408    7.014268    7.021084
  z=0.7    6.600512    6.607894    6.615235    6.622533    6.629789    6.637003
  z=0.8    6.220886    6.228623    6.236319    6.243975    6.251591    6.259166
  z=0.9    5.847995    5.856050    5.864068    5.872046    5.879986    5.887887
  z=1      5.482113    5.490454    5.498759    5.507026    5.515257    5.523450"
)

tidy_tbl <- tbl %>%
  gather(key = "time", value = "temp", starts_with("t=")) %>%
  separate(depth, c("z", "depth"), sep = "=") %>%
  separate(time, c("t", "time"), sep = "-") %>%
  select(-z, -t) %>%
  mutate_at(vars(depth, time), as.numeric) %>%
  print()
# A tibble: 60 x 3
   depth     time  temp
   <dbl>    <dbl> <dbl>
 1 0.100 24548400  9.00
 2 0.200 24548400  8.59
 3 0.300 24548400  8.18
 4 0.400 24548400  7.78
 5 0.500 24548400  7.38
 6 0.600 24548400  6.99
 7 0.700 24548400  6.60
 8 0.800 24548400  6.22
 9 0.900 24548400  5.85
10 1.00  24548400  5.48
# ... with 50 more rows


ggplot(data = tidy_tbl) +
  theme_bw() +
  geom_tile(aes(x = time, y = depth, fill = temp)) +
  scale_fill_viridis(name = "Temp") + 
  labs(x = "Time", y = "Depth")

enter image description here

Upvotes: 3

Related Questions