jon
jon

Reputation: 11366

R: perfect smoothing curve

I am trying to fit smooth curve to my dataset; is there is any better smoothing curve than I produced using the following codes:

x <- seq(1, 10, 0.5)
y <- c(1, 1.5, 1.6, 1.7, 2.1,
       2.2, 2.2, 2.4, 3.1, 3.3,
        3.7, 3.4, 3.2, 3.1, 2.4,
        1.8, 1.7, 1.6, 1.4)
lo <- loess(y~x)
plot(x,y)
xv <- seq(min(x),max(x), (max(x) - min(x))/1000000)
lines(xv, predict(lo,xv), col='blue', lwd=1)

EDITS:

I do not intend to produce good looking (not necessary) I want show a smoothed trend .... I am not concerned with associated model formula ....I do need to recover formula

Upvotes: 2

Views: 6310

Answers (3)

Ben Bolker
Ben Bolker

Reputation: 226057

I think perhaps you're looking for an interpolated smooth line, which in the case of R is probably most easily accomplished by fitting an interpolation spline? As the other answers discuss, that's not what statistical fitting is about, but there are many contexts where you want a smooth interpolated curve -- I think your terminology may have thrown people off.

Splines are more numerically stable than polynomials.

x <- seq(1, 10, 0.5)
y <- c(1, 1.5, 1.6, 1.7, 2.1,
    2.2, 2.2, 2.4, 3.1, 3.3,
    3.7, 3.4, 3.2, 3.1, 2.4,
    1.8, 1.7, 1.6, 1.4)

library(splines)

isp <- interpSpline(x,y)

xvec <- seq(min(x),max(x),length=200)  ## x values for prediction

png("isp.png")
plot(x,y)
## predict() produces a list with x and y components
lines(predict(isp,xvec),col="red")
dev.off()

enter image description here

Upvotes: 4

Andrie
Andrie

Reputation: 179398

As posed, the question is almost meaningless. There is no such thing as a "best" line of fit, since "best" depends on the objectives of your study. It is fairly trivial to generate a smoothed line to fit through every single point of data (e.g. a 18th order polynomial will fit your data perfectly, but will most likely be quite meaningless).

That said, you can specify the amount of smoothness of a loess model by changing the span argument. The larger the value of span, the smoother the curve, the smaller the value of span, the more it will fit each point:

Here is a plot with the value span=0.25:

x <- seq(1, 10, 0.5)
y <- c(1, 1.5, 1.6, 1.7, 2.1,
    2.2, 2.2, 2.4, 3.1, 3.3,
    3.7, 3.4, 3.2, 3.1, 2.4,
    1.8, 1.7, 1.6, 1.4)

xl <- seq(1, 10, 0.125)
plot(x, y)
lines(xl, predict(loess(y~x, span=0.25), newdata=xl))

enter image description here


An alternative approach is to fit splines through your data. A spline is constrained to pass through each point (whereas a smoother such as lowess may not.)

spl <- smooth.spline(x, y)
plot(x, y)
lines(predict(spl, xl))

enter image description here

Upvotes: 5

Spacedman
Spacedman

Reputation: 94172

You've got 19 points, so a polynomial up to X^18 will bullseye each of your points:

> xl=seq(0,10,len=100)
> p=lm(y~poly(x,18))
> plot(x,y)
> lines(xl,predict(p,newdata=data.frame(x=xl)))

BUT that's ignoring what statistics is all about. Its about acknowledging that curves won't fit through points. Its about finding a model with a small number of parameters that explains as much as it can about the data, and leaves only noise. Its not about spearing your points with a curve - a curve so drawn has very little meaning between the data points.

Upvotes: 4

Related Questions