alko989
alko989

Reputation: 7908

2D summary plot with counts as labels

I have measurements of a quantity (value) at specific points (lon and lat), like the example data below:

library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15), 
                  lat = runif(1000, 40, 60), 
                  value = rnorm(1000))

I want to make a 2D summary (e.g. mean) of the measured values with color in space and on top of that I want to show the counts as labels.

I can plot the labels and to the summary plot

## Left plot
ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex")
## Right plot
ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

enter image description here

But when I combine both I loose the summary:

ggplot(dat) +
  aes(x = lon, y = lat, z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

enter image description here

I can achieve the opposite, counts as color and summary as labels:

ggplot(dat, aes(lon, lat, z = value)) +
  geom_hex(bins = 5) +
  stat_summary_hex(aes(label=..value..), bins = 5, 
                   fun = function(x) round(mean(x), 3), 
                   geom = "text")

enter image description here

Upvotes: 6

Views: 518

Answers (4)

Marek Fiołka
Marek Fiołka

Reputation: 4949

I propose a completely different approach to this problem. However, it needs to be clarified a bit first. You write "I have measurements of a quantity (value) at specific points (lon and lat)" but you do not specify these points exactly. Your data (generated) contains 1000 lon points and the same number of lat points.

Anyway, see for yourself.

library(tidyverse)

set.seed(1)
dat <- 
  tibble(
    lon = runif(1000, 1, 15), 
    lat = runif(1000, 40, 60), 
    value = rnorm(1000)
  ) 

dat %>% distinct(lon) %>% nrow() #1000
dat %>% distinct(lat) %>% nrow() #1000

My guess is that for real data you have a much smaller set of values for lon and lat. Let me break it down to an accuracy of 2.

grid = 2

dat %>% mutate(
    lon = round(lon/grid)*grid,
    lat = round(lat/grid)*grid,
  ) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n()
  )

As you can see after rounding, the data was grouped according to these two variables and then I calculated the statistics you are interested in (mean and number of observations).

Also note that these statistics are generated at the intersection of lon and lat, so we have a square grid. In your solution, this is not the case at all. You are not getting the number of observations at these points and your grid is not square.

So let's make a graph.

dat %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  theme_bw()

enter image description here

Nothing stands in the way of increasing your grid a bit, let's say 4.

grid = 4

datg = dat %>% mutate(
  lon = round(lon/grid)*grid,
  lat = round(lat/grid)*grid,
) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n()
  )

datg %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  theme_bw()

enter image description here

Using such a solution, we can easily supplement the labels in the points of interest to us, e.g. with the average value. This time we will use grid = 1.5.

grid = 1.5

datg = dat %>% mutate(
  lon = round(lon/grid)*grid,
  lat = round(lat/grid)*grid,
) %>% 
  group_by(lon, lat) %>% 
  summarise(
    mean = mean(value),
    label = n(),
    lab2 = paste0("(", round(mean, 2), ")")
  )

datg %>% ggplot(aes(lon,lat,z=mean)) + 
  geom_contour_filled(binwidth = 0.25) + 
  geom_text(aes(label = label)) + 
  geom_text(aes(label = lab2), nudge_y = -.5, size = 3) + 
  theme_bw()

enter image description here

Hope this solution fits your needs much better than the stat_binhex based solution.

Upvotes: 2

tjebo
tjebo

Reputation: 23717

To be fair, I find this a very strange behaviour. I like your solution though - I really don't find it very hacky to add fill = NULL. In contrary, I find this very elegant. Here a more hacky approach, basically resulting the same, but with one more line. It's using ggnewscale.

library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15), 
                  lat = runif(1000, 40, 60), 
                  value = rnorm(1000))
ggplot(dat) +
  aes(x = lon, y = lat,z = value) +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  ggnewscale::new_scale_fill() +
  stat_binhex(aes(label = ..count..), bins = 5, geom = "text")

Created on 2022-02-17 by the reprex package (v2.0.1)

Upvotes: 1

Waldi
Waldi

Reputation: 41210

The problem here is that both plots share the same legend scale.

As the scales ranges are different : 0-40 vs -1.5 - 0.5, the biggest range makes values of the smallest range appear with (almost) the same color.

This is why displaying count as color works, but the opposite doesn't seem to work.

As an illustration, if you rescale the mean calculation, colors variations are visible:

  rescaled_mean <- function(x) mean(x)*40
 
   ggplot(dat) +
    aes(x = lon, y = lat, z = value)  +
    stat_summary_hex(bins = 5, fun = "rescaled_mean", geom = "hex")+
    stat_binhex(aes(label = ..count..), bins = 5, geom = "text") +
    theme_bw()   

enter image description here

Upvotes: 1

alko989
alko989

Reputation: 7908

While writing the question, which took some hours of testing, I found a solution: adding a fill=NULL, or fill=mean(value) in the text one gives me what I want. Below the code and their resulting plots; the only difference is the label of the legend.

But it feels very hacky, so I would appreciate a better solution.

ggplot(dat) +
  aes(x = lon, y = lat, z = value)  +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count.., fill = NULL), bins = 5, geom = "text") +
  theme_bw()



ggplot(dat) +
  aes(x = lon, y = lat, z = value)  +
  stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
  stat_binhex(aes(label = ..count.., fill = mean(value)), bins = 5, geom = "text") +
  theme_bw()

enter image description here

Upvotes: 6

Related Questions