psteelk
psteelk

Reputation: 1365

Plot polynomial regression curve in R

I have a simple polynomial regression which I do as follows

attach(mtcars)
fit <- lm(mpg ~ hp + I(hp^2))

Now, I plot as follows

> plot(mpg~hp)
> points(hp, fitted(fit), col='red', pch=20)

This gives me the following

Plot of mpg versus hp

Fitted Values

I want to connect these points into a smooth curve, using lines gives me the following

> lines(hp, fitted(fit), col='red', type='b')

Line plot

What am I missing here. I want the output to be a smooth curve which connects the points

Upvotes: 36

Views: 88870

Answers (4)

benetsugarboy
benetsugarboy

Reputation: 21

I found a funny solution.

plot(mpg ~ hp, data = mtcars)
curve(expr = cbind(1, poly(x, degree = 2, raw = TRUE)) %*% fit$coefficients,
      add = TRUE, col = "red")

Essentially, it simply applies the matrix definition of a linear regression.

a %*% coef = b

Here, the matrix a can be expressed either way:

cbind(1, poly(x, degree = 2, raw = TRUE))
t(sapply(x, `^`, e2 = 0:2))

And this works because curve takes expressions where x is the variable.

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145755

Generally a good way to go is to use the predict() function. Pick some x values, use predict() to generate corresponding y values, and plot them. It can look something like this:

newdat = data.frame(hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 100))
newdat$pred = predict(fit, newdata = newdat)

plot(mpg ~ hp, data = mtcars)
with(newdat, lines(x = hp, y = pred))

enter image description here

See Roman's answer for a fancier version of this method, where confidence intervals are calculated too. In both cases the actual plotting of the solution is incidental - you can use base graphics or ggplot2 or anything else you'd like - the key is just use the predict function to generate the proper y values. It's a good method because it extends to all sorts of fits, not just polynomial linear models. You can use it with non-linear models, GLMs, smoothing splines, etc. - anything with a predict method.

Upvotes: 17

Roman Luštrik
Roman Luštrik

Reputation: 70623

I like to use ggplot2 for this because it's usually very intuitive to add layers of data.

library(ggplot2)
fit <- lm(mpg ~ hp + I(hp^2), data = mtcars)
prd <- data.frame(hp = seq(from = range(mtcars$hp)[1], to = range(mtcars$hp)[2], length.out = 100))
err <- predict(fit, newdata = prd, se.fit = TRUE)

prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit

ggplot(prd, aes(x = hp, y = fit)) +
  theme_bw() +
  geom_line() +
  geom_smooth(aes(ymin = lci, ymax = uci), stat = "identity") +
  geom_point(data = mtcars, aes(x = hp, y = mpg))

enter image description here

Upvotes: 36

Davide Passaretti
Davide Passaretti

Reputation: 2771

Try:

lines(sort(hp), fitted(fit)[order(hp)], col='red', type='b') 

Because your statistical units in the dataset are not ordered, thus, when you use lines it's a mess.

Upvotes: 33

Related Questions