LazyMark2
LazyMark2

Reputation: 25

How can I know the extremum of a polynomial regression curve generated by geom_smooth in R?

I began to have this question while reading this book "R for Data Science" by Hadley Wickham. In the chapter of "Data Transformation", the author used this example:

library(nycflights13)
library(tidyverse)
by_dest <- group_by(flights, dest)
delay <- summarise(by_dest,
  count = n(),
  dist = mean(distance, na.rm = TRUE),
  delay = mean(arr_delay, na.rm = TRUE)
)
#> `summarise()` ungrouping output (override with `.groups` argument)
delay <- filter(delay, count > 20, dest != "HNL")

# It looks like delays increase with distance up to ~750 miles 
# and then decrease. Maybe as flights get longer there's more 
# ability to make up delays in the air?
ggplot(data = delay, mapping = aes(x = dist, y = delay)) +
  geom_point(aes(size = count), alpha = 1/3) +
  geom_smooth(se = FALSE)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The output plot looks like as follows: relationship between delay and flight distance

So I wonder if we can know the extremum of the curve and the corresponding x value using R?

From this answer, I've figured using stat_poly_eq() from ggpmisc to calculate the polynomial regression equation:

library(ggpmisc)    
formula=y ~ poly(x, 3, raw=TRUE)
p <- ggplot(data = delay, mapping = aes(x = dist, y = delay)) 
p <- p + geom_point(aes(size = count), alpha = 1/3) 
p <- p+ geom_smooth(method = "lm", formula = formula, se = FALSE)
(p1 <- p+ stat_poly_eq(formula = formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE))

Polynomial Equation

And use that equation to find out the extremum of the curve, which I think it's very not clever. So I wonder how can I figure out the extremum and the corresponding x value of my regression curve (not the max and min value of raw data but the max and the min value of the regression curve).

Upvotes: 0

Views: 105

Answers (1)

henryn
henryn

Reputation: 1237

You can see in your first code snippet, your ggplot call outputs this message:

#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Looking at the documentation for geom_smooth and stat_smooth:

stats::loess() is used for less than 1,000 observations; otherwise mgcv::gam() is used with formula = y ~ s(x, bs = "cs") with method = "REML".

So you could just call stats::loess on your data to fit the model implied by the geom_smooth curve, and then use the predict method to get values outside of your actual range.

Upvotes: 0

Related Questions