Murillo
Murillo

Reputation: 1

How can I get the peak and valleys of a geom_smooth line in ggplot2?

I'm plotting some data across the year and I need to get the peaks and valleys of the curve, not from the dataset and then identify which is that date. How can I do that ?

g <- ggplot(df, aes(x=date, y=ndvitrend)) + geom_point() + geom_smooth(method = "gam", se=FALSE) + theme_minimal() +
     scale_x_date(date_labels="%b %Y", date_breaks = "1 month") + 
     theme(plot.title = element_text(hjust = 0.5)) + theme(axis.line = element_line(color = 'black')) +
     theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) + 
     stat_peaks(span=NULL, color="red")

peak of the dataset, not on the line

Thank you

Upvotes: 0

Views: 1082

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173858

It's far easier to answer this type of question if we have reproducible data. However, I will recreate something that is similar to your data set:

set.seed(69)

df <- data.frame(date = seq(as.Date("2019-09-01"), 
                            as.Date("2020-09-01"), by = "3 days"),
                 ndvitrend = 0.3 * sin(seq(-2, 2 * pi - 2, length.out = 123)) +
                             rnorm(123, 0.5, 0.2))

Now let's plot this using your code:

library(ggpmisc)

g <- ggplot(df, aes(x = date, y = ndvitrend)) + 
      geom_point() + 
      geom_smooth(method = "gam", se = FALSE) + 
      stat_peaks(span = NULL, color = "red") +
      theme_minimal() +
      scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") + 
      theme(plot.title  = element_text(hjust = 0.5),
            axis.line   = element_line(color = 'black'),
            axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
g
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'

You'll notice that the console told us the formula that was being used to create the smoothed line. We can therefore use that to answer your question. We need the gam function from package mgcv:

library(mgcv)

df$days <- as.numeric(difftime(df$date, df$date[1], units = "day"))
model   <- gam(ndvitrend ~ s(days, bs = "cs"), data = df)
df$prediction <- predict(model)

So now we have stored the predictions from this model into our data frame. That should give us the identical smoothing curve that geom_smooth gave us:

g + geom_line(aes(y = prediction), data = df, 
                   size = 3, linetype = 2, col = "red")
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'

This is correct. Now all we need to do is find out where the peak of our prediction was:

g + geom_hline(yintercept = max(df$prediction), linetype = 2)
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'

So we can see that our smoothed peak in this data set is

max(df$prediction)
#> [1] 0.76714

And it occurs on:

df$date[which.max(df$prediction)]
#> [1] "2020-03-20"

Created on 2020-09-18 by the reprex package (v0.3.0)

Upvotes: 4

Related Questions