Annelise Dahl
Annelise Dahl

Reputation: 75

Adding the maximum (peak) value in ggplot for geom_smoth

I have a geom_smooth that has an x-axis date, y-axis COVID cases, and then two categories. I'm trying to plot the maximum peak.

# Reproducible data
library(tidyverse)
df <- tribble(~date, ~cases, ~category,
              "2021/1/1", 100, "A",
              "2021/1/1", 103, "B",
              "2021/1/2", 108, "A",
              "2021/1/2", 109, "B",
              "2021/1/3", 102, "A",
              "2021/1/3", 120, "B",
              "2021/1/4", 150, "A",
              "2021/1/4", 160, "B",
              "2021/1/5", 120, "A",
              "2021/1/5", 110, "B",
              "2021/1/6", 115, "A",
              "2021/1/6", 105, "B",)

# Plotting geom_smooth
df %>%
  ggplot(df, mapping = aes(date, cases, group = category, color = category)) +
  geom_smooth()

How do I add the maximum peak to the geom_smooth? Ideally, I want both a point and a text that says what the peak case is.

I tried finding the peaks outside of the ggplot code - but it returns a different peak because the geom_smooth is creating its own function, not simply the mean of that category.

The response below worked, but I want to move the labels to make it more legible, but geom_text_repel seems to only refer to the first curve rather than both. Any advice?

library(ggplot2)
library(tidyverse)
library(ggrepel)

# Fake data
ar =hist(rnorm(10000,1), breaks = 180, plot=F)$counts
br =hist(rnorm(11000,1), breaks = 180, plot=F)$counts

df <-  rbind(
  tibble(category="B", date = seq(as.Date("2021-01-01"),by=1, length.out=length(br)),value=br),
  tibble(category="A", date = seq(as.Date("2021-01-01"),by=1, length.out=length(ar)),value=ar)
)
# create the smooth and retain rows with max of smooth, using slice_max
sm_max = df %>% group_by(category) %>%
  mutate(smooth =predict(loess(value~as.numeric(date), span=.5))) %>% 
  slice_max(order_by = smooth)

# Plot, using the same smooth as above (default is loess, span set at set above)
df %>%
  ggplot(df, mapping = aes(date, value, group = category, color = category)) +
  geom_point() +
  geom_smooth(span=.5, se=F) + 
  geom_point(data=sm_max, aes(y=smooth),color="black", size=5) + 
  geom_text_repel(data = sm_max, aes(label=paste0("Peak: ",round(smooth,1))), color="black")

geom_text_repel(data = sm_max_p3, aes(x = date,
                                      y = smooth,
                                      label = paste0(candidate, " Peak: ",round(smooth,1))

enter image description here

Upvotes: 3

Views: 2124

Answers (2)

langtang
langtang

Reputation: 24877

You need to generate the smooth first, and identify the max. You can then either

  1. plot the data, the smooth, and the max together, or
  2. plot the data and the max, and again use the geom_smooth() call, making sure to use the same smooth in geom_smooth that you did when generating and identifying the max.

Here is an example, which uses the latter of these two options

# Fake data
ar =hist(rnorm(10000,1), breaks = 180, plot=F)$counts
br =hist(rnorm(25000,1), breaks = 180, plot=F)$counts

df = rbind(
  tibble(category="B", date = seq(as.Date("2021-01-01"),by=1, length.out=length(br)),value=br),
  tibble(category="A", date = seq(as.Date("2021-01-01"),by=1, length.out=length(ar)),value=ar)
)
# create the smooth and retain rows with max of smooth, using slice_max
sm_max = df %>% group_by(category) %>%
  mutate(smooth =predict(loess(value~as.numeric(date), span=.5))) %>% 
  slice_max(order_by = smooth)
  
# Plot, using the same smooth as above (default is loess, span set at set above)
df %>%
  ggplot(df, mapping = aes(date, value, group = category, color = category)) +
  geom_point() +
  geom_smooth(span=.5, se=F) + 
  geom_point(data=sm_max, aes(y=smooth),color="black", size=5) + 
  geom_text(data = sm_max, aes(y=smooth, label=paste0("Peak: ",round(smooth,1))), color="black")

peak_smooth

Upvotes: 1

Dan Adams
Dan Adams

Reputation: 5254

If you're just looking to label the maximum measured value, you can use {gghighlight} to show and label only that point on top of the smoothed curve. Also your date is a character so it's a discrete variable. Therefore your geom_smooth() is just a point-to-point line. Here, I convert it to a continuous variable with mutate(date = lubridate::ymd(date)).

library(tidyverse)
library(lubridate)
library(gghighlight)

df <- tribble(~date, ~cases, ~category,
              "2021/1/1", 100, "A",
              "2021/1/1", 103, "B",
              "2021/1/2", 108, "A",
              "2021/1/2", 109, "B",
              "2021/1/3", 102, "A",
              "2021/1/3", 120, "B",
              "2021/1/4", 150, "A",
              "2021/1/4", 160, "B",
              "2021/1/5", 120, "A",
              "2021/1/5", 110, "B",
              "2021/1/6", 115, "A",
              "2021/1/6", 105, "B",)

# Plotting geom_smooth
df %>%
  mutate(date = ymd(date)) %>%
  group_by(category) %>%
  mutate(is_max = cases == max(cases)) %>% 
  ggplot(df, mapping = aes(date, cases, color = category)) +
  geom_smooth() +
  geom_point(size = 3) +
  gghighlight(is_max,
              n = 1,
              unhighlighted_params = list(alpha = 0),
              label_key = cases)

Created on 2022-02-17 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions