ggplot geom_smooth (using glm and y ~ poly(x,2) and glm(), approx() outside ggplot does not match

Question

>DF
x.values y.values
0        1.0000000          
2        0.5443318          
4        0.4098546          
6        0.3499007  

ggplot(DF, aes(x=x.values, y=y.values)) + 
     geom_point() +
     geom_smooth(se=FALSE, method = "glm", formula= y ~ poly(x,2))

Gives me a polynomial fit to the data which looks like this:

_{(source: preview.ibb.co)}

From the image I can visually estimate the the extrapolated x.value for y.value=0.5, to be ~2.5-2.6.

However, when I estimate the interpolated x.value outside of ggplot, I get a value of 2.78.

M <- glm(formula = y.values ~ poly(x.values,2), data = DF)
t0.5 <- approx(x = M$fitted, y = DF$x.values, xout=0.50)$y
t0.5
[1] 2.780246

Can anyone please explain this discrepancy?

eipi10 · Accepted Answer

The model is predicting y.values from x.values, so the fitted values of the model are y.values, not x.values. Thus, the code should be t0.5 <- approx(x = DF$x.values, y = fitted(M), xout=0.50)$y. After making this change, you can see that linear interpolation and model prediction are what one would expect by visual inspection of the plot.

p = ggplot(DF, aes(x=x.values, y=y.values)) + 
  geom_point() +
  geom_smooth(se=FALSE, method = "glm", formula= y ~ poly(x,2))


M <- glm(formula = y.values ~ poly(x.values,2), data = DF)

# linear interpolation of fitted values at x.values=0.5
t0.5 <- approx(x = DF$x.values, y = fitted(M), xout=0.50)$y

# glm model prediction at x.values=0.5
predy = predict(M, newdata=data.frame(x.values=0.5))

# Data frame with linear interpolation of predictions along the full range of x.values
interp.fit = as.data.frame(approx(x=DF$x.values, y=fitted(M), 
                                  xout=seq(min(DF$x.values), max(DF$x.values),length=100)))

p + 
  geom_line(data=interp.fit, aes(x,y), colour="red", size=0.7) +
  annotate(x=0.5, y=t0.5, geom="point", shape=3, colour="red", size=4) +
  annotate(x=0.5, y=predy, geom="point", shape=16, colour="purple", size=4)

In response to the comment: To calculate x at any given y, you could use the quadratic formula. The regression equation is:

y = a*x^2 + b*x + c

Where a, b, and c are the regression coefficients (with the order reversed relative to the values returned by coef(M)).

0 = a*x^2 + b*x + (c - y)

Now just apply the quadratic formula to get the two values of x for any given value of y (where y is constrained to be in the range of the regression function), noting that the c coefficient in the standard quadratic formula is here replaced by c - y.

ggplot geom_smooth (using glm and y ~ poly(x,2) and glm(), approx() outside ggplot does not match

Answers (1)

Related Questions