Jane Miller
Jane Miller

Reputation: 153

trying to find the 95% confidence interval for a predicted value using bootstrap R

I have a linear model:

fit <- lm(lifespan ~ log(Metabolic.by.mass), data = anage)

I've predicted my y-value for an x-value of -6.10529 using this model which is determined to be 17.34775. Now I'm trying to create a 95% confidence interval for that quantity using the nonparametric bootstrap for 1000 iterations and the pivotal confidence interval.

So far I have:

resample = function(x) {
  sample(x, size = length(x), replace = TRUE)
}

B = 1000
pred <- numeric(B)
for (ii in 1:B) {
   boot <- resample(seq(1, nrow(anage)))
   fit <- lm(lifespan ~ log(Metabolic.by.mass), data = anage[boot,])
   pred[ii] <- predict(fit) + sample(resid(fit), size = 1)
}
quantile(pred, c(0.025, 0.975))

However, this confidence interval looks to be too large. Is there another way to find the 95% pivotal confidence interval for this value?

Upvotes: 0

Views: 1200

Answers (2)

StupidWolf
StupidWolf

Reputation: 46978

You are sampling the observations but randomly adding 1 residual value back to the predicted value, which doesn't make sense.

Firstly, as pointed out in the answer by @Sirius, you can use the predict.lm function to give you a ci and this is how it looks like, with an example dataset:

library(ggplot2)
set.seed(111)

anage = data.frame(lifespan = runif(100,1,2))
anage$Metabolic.by.mass = exp(anage$lifespan) + rnorm(100)

ggplot(anage,aes(x=log(Metabolic.by.mass),y=lifespan)) + 
geom_point() + geom_smooth(method="lm")

enter image description here

To do the bootstrap, you sample with replacement on the whole dataframe, and obtain predicted values for all your observations. After N bootstraps you can calculate the confidence interval across all observations:

B = 1000
pred <- matrix(0,ncol=B,nrow=nrow(anage))
for (ii in 1:B) {
   boot <- sample(nrow(anage),replace=TRUE)
   fit <- lm(lifespan ~ log(Metabolic.by.mass), data = anage[boot,])
   pred[,ii] <- predict(fit,anage)
}

ci = apply(pred,1,quantile,c(0.025, 0.975))
anage$lb = ci[1,]
anage$ub = ci[2,]

We can plot it out and it looks pretty similar to the above, because I simulated the data with gaussian noise:

ggplot(anage,aes(x=log(Metabolic.by.mass),y=lifespan)) + 
geom_point() + geom_smooth(method="lm",se=FALSE) + 
geom_ribbon(aes(ymin = lb,ymax=ub),alpha=0.2,fill="#a7d0cd") + theme_bw()

enter image description here

Upvotes: 0

Sirius
Sirius

Reputation: 5429

This should give you a 95% confidence interval of your prediction:


predict( fit, interval="confidence", level = 0.95 )

see ?predict.lm for more details

Upvotes: 1

Related Questions