Reputation: 153
I have a linear model:
fit <- lm(lifespan ~ log(Metabolic.by.mass), data = anage)
I've predicted my y-value for an x-value of -6.10529 using this model which is determined to be 17.34775. Now I'm trying to create a 95% confidence interval for that quantity using the nonparametric bootstrap for 1000 iterations and the pivotal confidence interval.
So far I have:
resample = function(x) {
sample(x, size = length(x), replace = TRUE)
}
B = 1000
pred <- numeric(B)
for (ii in 1:B) {
boot <- resample(seq(1, nrow(anage)))
fit <- lm(lifespan ~ log(Metabolic.by.mass), data = anage[boot,])
pred[ii] <- predict(fit) + sample(resid(fit), size = 1)
}
quantile(pred, c(0.025, 0.975))
However, this confidence interval looks to be too large. Is there another way to find the 95% pivotal confidence interval for this value?
Upvotes: 0
Views: 1200
Reputation: 46978
You are sampling the observations but randomly adding 1 residual value back to the predicted value, which doesn't make sense.
Firstly, as pointed out in the answer by @Sirius, you can use the predict.lm
function to give you a ci and this is how it looks like, with an example dataset:
library(ggplot2)
set.seed(111)
anage = data.frame(lifespan = runif(100,1,2))
anage$Metabolic.by.mass = exp(anage$lifespan) + rnorm(100)
ggplot(anage,aes(x=log(Metabolic.by.mass),y=lifespan)) +
geom_point() + geom_smooth(method="lm")
To do the bootstrap, you sample with replacement on the whole dataframe, and obtain predicted values for all your observations. After N bootstraps you can calculate the confidence interval across all observations:
B = 1000
pred <- matrix(0,ncol=B,nrow=nrow(anage))
for (ii in 1:B) {
boot <- sample(nrow(anage),replace=TRUE)
fit <- lm(lifespan ~ log(Metabolic.by.mass), data = anage[boot,])
pred[,ii] <- predict(fit,anage)
}
ci = apply(pred,1,quantile,c(0.025, 0.975))
anage$lb = ci[1,]
anage$ub = ci[2,]
We can plot it out and it looks pretty similar to the above, because I simulated the data with gaussian noise:
ggplot(anage,aes(x=log(Metabolic.by.mass),y=lifespan)) +
geom_point() + geom_smooth(method="lm",se=FALSE) +
geom_ribbon(aes(ymin = lb,ymax=ub),alpha=0.2,fill="#a7d0cd") + theme_bw()
Upvotes: 0
Reputation: 5429
This should give you a 95% confidence interval of your prediction:
predict( fit, interval="confidence", level = 0.95 )
see ?predict.lm for more details
Upvotes: 1