Reputation: 1143
This is where I'm at so far:
I have a data frame df
with two columns A
and B
(both containing real numbers) where b
is dependent on a
. I plot the columns against each other:
p = ggplot(df, aes(A, B)) + geom_point()
and see that the relationship is non-linear. Adding:
p = p + geom_smooth(method = 'loess', span = 1)
gives a 'good' line of best fit. Given a new value a
of A
I then use the following method to predict the value of B
:
B.loess = loess(B ~ A, span = 1, data = df)
predict(B.loess, newdata = a)
So far, so good. However, I then realise I can't extrapolate using loess
(presumably because it is non-parametric?!). The extrapolation seems fairly natural - the relationship looks something like a power type thing is going on e.g:
x = c(1:10)
y = 2^x
df = data.frame(A = x, B = y)
This is where I get unstuck. Firstly, what methods can I use to plot a line of best fit to this kind of ('power') data without using loess
? Pathetic attempts such as:
p = ggplot(df, aes(A, B)) + geom_point() +
geom_smooth(method = 'lm', formula = log(y) ~ x)
give me errors. Also, assuming I am actually able to plot a line of best fit that I am happy with, I am having trouble using predict
in a similar way I did when using loess
. For examples sake, suppose I am happy with the line of best fit:
p = ggplot(df, aes(A, B)) + geom_point() +
geom_smooth(method = 'lm', formula = y ~ x)
then if I want to predict what value B
would take if A
was equal to 11 (theoretically 2^11), the following method does not work:
B.lm = lm(B ~ A)
predict(B.lm, newdata = 11)
Any help much appreciated. Cheers.
Upvotes: 1
Views: 2531
Reputation: 121578
First , To answer your last question, you need to provide a data.frame with colnames are the predictors.
B.lm <- lm(B ~ A,data=df)
predict(B.lm, newdata = data.frame(A=11))
1
683.3333
As an alternative to loess you can try some higher polynomial regressions. Here I in this plot I compare poly~3
to loess
using latticeExtra
(easier to add the xspline interpolation) but in similar syntax to ggplot2.(layer).
xyplot(A ~ B,data=df,par.settings = ggplot2like(),
panel = function(x,y,...){
panel.xyplot(x,y,...)
grid.xspline(x,y,..., default.units = "native") ## xspline interpolation
})+
layer(panel.smoother(y ~ poly(x, 3), method = "lm"), style = 1)+ ## poly
layer(panel.smoother(y ~ x, span = 0.9),style=2) ### loeess
Upvotes: 10
Reputation: 66844
The default surface
for loess.control
is interpolate
which, unsurprisingly doesn't allow extrapolations. The alternative, direct
, allows you to extrapolate though a question remains as to whether this is meaningful.
predict(loess(hp~disp,mtcars),newdata=1000)
[1] NA
predict(loess(hp~disp,mtcars,control=loess.control(surface="direct")),newdata=1000)
[1] -785.0545
Upvotes: 5