Reputation: 1
I'm plotting some data across the year and I need to get the peaks and valleys of the curve, not from the dataset and then identify which is that date. How can I do that ?
g <- ggplot(df, aes(x=date, y=ndvitrend)) + geom_point() + geom_smooth(method = "gam", se=FALSE) + theme_minimal() +
scale_x_date(date_labels="%b %Y", date_breaks = "1 month") +
theme(plot.title = element_text(hjust = 0.5)) + theme(axis.line = element_line(color = 'black')) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
stat_peaks(span=NULL, color="red")
Thank you
Upvotes: 0
Views: 1082
Reputation: 173858
It's far easier to answer this type of question if we have reproducible data. However, I will recreate something that is similar to your data set:
set.seed(69)
df <- data.frame(date = seq(as.Date("2019-09-01"),
as.Date("2020-09-01"), by = "3 days"),
ndvitrend = 0.3 * sin(seq(-2, 2 * pi - 2, length.out = 123)) +
rnorm(123, 0.5, 0.2))
Now let's plot this using your code:
library(ggpmisc)
g <- ggplot(df, aes(x = date, y = ndvitrend)) +
geom_point() +
geom_smooth(method = "gam", se = FALSE) +
stat_peaks(span = NULL, color = "red") +
theme_minimal() +
scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
theme(plot.title = element_text(hjust = 0.5),
axis.line = element_line(color = 'black'),
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
g
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
You'll notice that the console told us the formula that was being used to create the smoothed line. We can therefore use that to answer your question. We need the gam
function from package mgcv
:
library(mgcv)
df$days <- as.numeric(difftime(df$date, df$date[1], units = "day"))
model <- gam(ndvitrend ~ s(days, bs = "cs"), data = df)
df$prediction <- predict(model)
So now we have stored the predictions from this model into our data frame. That should give us the identical smoothing curve that geom_smooth
gave us:
g + geom_line(aes(y = prediction), data = df,
size = 3, linetype = 2, col = "red")
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
This is correct. Now all we need to do is find out where the peak of our prediction was:
g + geom_hline(yintercept = max(df$prediction), linetype = 2)
#> `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
So we can see that our smoothed peak in this data set is
max(df$prediction)
#> [1] 0.76714
And it occurs on:
df$date[which.max(df$prediction)]
#> [1] "2020-03-20"
Created on 2020-09-18 by the reprex package (v0.3.0)
Upvotes: 4