How to obtain prediction intervals for linear regression in R

Question

This question probably stems from the fact that I don't fully understand what the predict() function is doing, but I'm wondering if there is a way to access the underlying prediction data so that I can get prediction intervals for a given unobserved value. Here's what I mean:

x <- rnorm(100,10)
y <- x+rnorm(100,5)

And making a linear model:

mod1 <- lm(y ~ x)

If I want the confidence intervals for the model estimates, I can do:

confint(mod1)

and get

>                  2.5 %    97.5 %
(Intercept) -8.1864342 29.254714
x            0.7578651  1.132339

If I wanted to, I could plug these lower and upper bound estimates into a prediction equation to get a lower and upper confidence interval for some input of x.

What if I want to do the same, but with a prediction interval? Using

predict(mod1, interval = "prediction")

looks like it fits the model to the existing data with lower and upper bounds, but doesn't tell me which parameters those lower and upper bounds are based on so that I could use them for an unobserved value.

(I know I can technically put a value into the predict() command, but I just want the underlying parameters so that I don't necessarily have to do the prediction in R)

Ramnath · Accepted Answer

The predict function accepts a newdata argument that computes the interval for unobserved values. Here is an example

x <- rnorm(100, 10)
y <- x + rnorm(100, 5)
d <- data.frame(x = x, y = y)

mod <- lm(y ~ x, data = d)

d2 <- data.frame(x = c(0.3, 0.6, 0.2))
predict(mod, newdata = d2, interval = 'prediction')

I don't know what you mean by underlying parameters. The computation of prediction intervals involves a complex formula and you cannot reduce it to a few simple parameters.

How to obtain prediction intervals for linear regression in R

Answers (1)

Related Questions