Reputation: 45
>DF
x.values y.values
0 1.0000000
2 0.5443318
4 0.4098546
6 0.3499007
ggplot(DF, aes(x=x.values, y=y.values)) +
geom_point() +
geom_smooth(se=FALSE, method = "glm", formula= y ~ poly(x,2))
Gives me a polynomial fit to the data which looks like this:
(source: preview.ibb.co)
From the image I can visually estimate the the extrapolated x.value for y.value=0.5, to be ~2.5-2.6.
However, when I estimate the interpolated x.value outside of ggplot, I get a value of 2.78.
M <- glm(formula = y.values ~ poly(x.values,2), data = DF)
t0.5 <- approx(x = M$fitted, y = DF$x.values, xout=0.50)$y
t0.5
[1] 2.780246
Can anyone please explain this discrepancy?
Upvotes: 1
Views: 1684
Reputation: 93811
The model is predicting y.values
from x.values
, so the fitted
values of the model are y.values
, not x.values
. Thus, the code should be t0.5 <- approx(x = DF$x.values, y = fitted(M), xout=0.50)$y
. After making this change, you can see that linear interpolation and model prediction are what one would expect by visual inspection of the plot.
p = ggplot(DF, aes(x=x.values, y=y.values)) +
geom_point() +
geom_smooth(se=FALSE, method = "glm", formula= y ~ poly(x,2))
M <- glm(formula = y.values ~ poly(x.values,2), data = DF)
# linear interpolation of fitted values at x.values=0.5
t0.5 <- approx(x = DF$x.values, y = fitted(M), xout=0.50)$y
# glm model prediction at x.values=0.5
predy = predict(M, newdata=data.frame(x.values=0.5))
# Data frame with linear interpolation of predictions along the full range of x.values
interp.fit = as.data.frame(approx(x=DF$x.values, y=fitted(M),
xout=seq(min(DF$x.values), max(DF$x.values),length=100)))
p +
geom_line(data=interp.fit, aes(x,y), colour="red", size=0.7) +
annotate(x=0.5, y=t0.5, geom="point", shape=3, colour="red", size=4) +
annotate(x=0.5, y=predy, geom="point", shape=16, colour="purple", size=4)
In response to the comment: To calculate x at any given y, you could use the quadratic formula. The regression equation is:
y = a*x^2 + b*x + c
Where a, b, and c are the regression coefficients (with the order reversed relative to the values returned by coef(M)
).
0 = a*x^2 + b*x + (c - y)
Now just apply the quadratic formula to get the two values of x
for any given value of y
(where y
is constrained to be in the range of the regression function), noting that the c
coefficient in the standard quadratic formula is here replaced by c - y
.
Upvotes: 3