user32259
user32259

Reputation: 1143

A replacement for method = 'loess'

This is where I'm at so far:

I have a data frame df with two columns A and B (both containing real numbers) where b is dependent on a. I plot the columns against each other:

p = ggplot(df, aes(A, B)) + geom_point()

and see that the relationship is non-linear. Adding:

p = p + geom_smooth(method = 'loess', span = 1)

gives a 'good' line of best fit. Given a new value a of A I then use the following method to predict the value of B:

B.loess = loess(B ~ A, span = 1, data = df)
predict(B.loess, newdata = a)

So far, so good. However, I then realise I can't extrapolate using loess (presumably because it is non-parametric?!). The extrapolation seems fairly natural - the relationship looks something like a power type thing is going on e.g:

x = c(1:10)
y = 2^x
df = data.frame(A = x, B = y)

This is where I get unstuck. Firstly, what methods can I use to plot a line of best fit to this kind of ('power') data without using loess? Pathetic attempts such as:

p = ggplot(df, aes(A, B)) + geom_point() +
      geom_smooth(method = 'lm', formula = log(y) ~ x)

give me errors. Also, assuming I am actually able to plot a line of best fit that I am happy with, I am having trouble using predict in a similar way I did when using loess. For examples sake, suppose I am happy with the line of best fit:

p = ggplot(df, aes(A, B)) + geom_point() +
      geom_smooth(method = 'lm', formula = y ~ x)

then if I want to predict what value B would take if A was equal to 11 (theoretically 2^11), the following method does not work:

B.lm = lm(B ~ A)
predict(B.lm, newdata = 11)

Any help much appreciated. Cheers.

Upvotes: 1

Views: 2531

Answers (2)

agstudy
agstudy

Reputation: 121578

First , To answer your last question, you need to provide a data.frame with colnames are the predictors.

B.lm <- lm(B ~ A,data=df)
predict(B.lm, newdata = data.frame(A=11))

     1 
683.3333 

As an alternative to loess you can try some higher polynomial regressions. Here I in this plot I compare poly~3 to loess using latticeExtra(easier to add the xspline interpolation) but in similar syntax to ggplot2.(layer).

xyplot(A ~ B,data=df,par.settings = ggplot2like(),
       panel = function(x,y,...){
         panel.xyplot(x,y,...)
         grid.xspline(x,y,..., default.units = "native") ## xspline interpolation
       })+
  layer(panel.smoother(y ~ poly(x, 3), method = "lm"), style = 1)+  ## poly
  layer(panel.smoother(y ~ x, span = 0.9),style=2)   ### loeess

enter image description here

Upvotes: 10

James
James

Reputation: 66844

The default surface for loess.control is interpolate which, unsurprisingly doesn't allow extrapolations. The alternative, direct, allows you to extrapolate though a question remains as to whether this is meaningful.

predict(loess(hp~disp,mtcars),newdata=1000)
[1] NA
predict(loess(hp~disp,mtcars,control=loess.control(surface="direct")),newdata=1000)
[1] -785.0545

Upvotes: 5

Related Questions