H.Traver
H.Traver

Reputation: 171

Equation for 95% CI on regression?

I have calculated/plotted a linear and a 95% CI on the model parameters as follows

lm <- lm(cars$speed~cars$dist)

conf <- predict(lm, interval='confidence')
conf <- cbind(cars,conf)

CI <- as.data.frame(confint(lm))

library(ggplot2)

plot<-ggplot(conf,aes(dist,speed)) +
  geom_line(aes(y=fit),color='black') +
  geom_line(aes(y=lwr),color='red',linetype='dashed') +
  geom_line(aes(y=upr),color='red',linetype='dashed')
plot

I am wondering what the equation is to calculate lower and upper limits (the red lines) on the plot? I assumed these could be calculated using the values from the confint() function? I tried calculating the lwr and upr values like so but I did not get the same result.

lower <- CI[1,1] + CI[2,1]*cars$dist
upper <- CI[1,2] + CI[2,2]*cars$dist

Upvotes: 1

Views: 163

Answers (1)

Taher A. Ghaleb
Taher A. Ghaleb

Reputation: 5240

Here is how the confidence interval is calculated in lm.predict using the following equation:

CI equation

which can be implemented as follows:

my.lm <- lm(cars$speed~cars$dist)

intercept <- model.matrix(delete.response(terms(my.lm)), cars)
fit.values <- c(intercept %*% coef(my.lm))

data.fit <- data.frame(x=cars$dist, fit=fit.values)

# compute t-value
tval <- qt((1-0.95)/2, df=nrow(data.fit)-2)

# compute Sxx
Sxx <- sum((data.fit$x - mean(data.fit$x))^2)

# compute MSres
MSres <- sum(my.lm$residuals^2)/(nrow(data.fit)-2)

# calculate confidence interval
CI <- data.frame(t(apply(data.fit, 1, FUN =  function(row){
  sqrt(MSres * (1/nrow(data.fit) + (as.numeric(row[1]) - mean(data.fit$x))^2/Sxx)) * tval * c(1, -1) + as.numeric(row[2])
})))
names(CI) <- c("lwr","upr")

head(CI)
#        lwr      upr
#1  6.917090 10.31299
#2  8.472965 11.40620
#3  7.307526 10.58483
#4 10.764584 13.08820
#5  9.626909 12.23906
#6  8.472965 11.40620

You may compare the results with the ones you obtained from predict.

Hope it helps.

Upvotes: 1

Related Questions