Ansley
Ansley

Reputation: 13

How to calculate confidence bands for fitted values of a linear regression by hand and plot them?

I'm trying to plot error bands around a linear regression. I'm working with the builtin trees dataset in R. Here's my code. No lines are showing up on the plot. Please help!

xdev <- log(Volume) - mean(log(Volume))
xdev
ydev <- Girth - mean(Girth)
ydev
b1 <- sum((xdev)*(ydev))/sum(xdev^2)
b0 <- mean(Girth) - mean(log(Volume))*b1
plot(log(Volume) ~ Girth)
abline(coef(lm(log(Volume) ~ Girth)), col=2)
(paste(b0, b1))
y.hat <- b0 + b1*log(Volume)
ss.explained <- sum((y.hat - mean(Girth))^2)
ss.unexplained <- sum((y.hat - Girth)^2)
stderr.b1 <- sqrt((ss.unexplained/29)/sum(xdev^2))
stderr.b1

std.yhat <- sqrt((ss.unexplained/29)*((1/31) + (xdev^2/sum(xdev^2))))
std.yhat
upp <-  y.hat + qt(0.95, 29)*std.yhat
low <-  y.hat - qt(0.95, 29)*std.yhat
upp
low
library(lattice)
plot(log(Volume) ~ Girth, data = trees)
abline(c(b0, b1), lty=1)
points(upp ~ Girth, data=trees, type="l", lty=2)
points(low ~ Girth, data=trees, type="l", lty=2)  

the graph resulting from this code

Upvotes: 1

Views: 415

Answers (1)

jay.sf
jay.sf

Reputation: 73252

Obviously you want to calculate OLS by hand as well as predictions with confidence bands. Since I'm not familiar with the approach you are using (but you definitely mixed up Y and X in the beginning of your code), I show you how to get the confidence bands with a similar manual approach I'm more familiar with.

data(trees)  ## load trees dataset

## fit
X <- cbind(1, trees$Girth)
y <- log(trees$Volume)
beta <- solve(t(X) %*% X) %*% t(X) %*% y
y.hat <- as.vector(X %*% beta)

## std. err. y.hat
res <- y - y.hat
n <- nrow(trees); k <- ncol(X)
VCV <- 1/(n - k)*as.vector(t(res) %*% res)*solve(t(X) %*% X)
se.yhat <- sqrt(diag(X %*% VCV %*% t(X)))

## CIs
tcrit <- qt(0.975, n - k)
upp <- y.hat + tcrit*se.yhat
low <- y.hat - tcrit*se.yhat

## plot
with(trees, plot(x=Girth, y=log(Volume)))
abline(a=beta, col=2)
lines(x=trees$Girth, y=upp, lty=2)
lines(x=trees$Girth, y=low, lty=2)

enter image description here

We can compare this with the results of respective R functions, which will give us the same values and, thus, the same plot as above.

fit <- lm(log(Volume) ~ Girth, trees)
pred <- stats::predict.lm(fit, se.fit=TRUE, df=fit$df.residual)

with(trees, plot(x=Girth, y=log(Volume)))
abline(fit, col=2)
lines(x=trees$Girth, y=pred$fit + tcrit*pred$se.fit, lty=2)
lines(x=trees$Girth, y=pred$fit - tcrit*pred$se.fit, lty=2)

Since you have noted lattice, you can further crosscheck using lattice::xyplot:

lattice::xyplot(fit$fitted.values ~ log(trees$Volume), 
                panel=mosaic::panel.lmbands)

enter image description here

Note: It seems you are using attach, which is discouraged in the community, see: Why is it not advisable to use attach() in R, and what should I use instead?. I therefore used with() and `$`() instead.

Upvotes: 0

Related Questions