Reputation: 31
So I need to graph a confidence interval for a prediction I ran. I can run the prediction, but when I go to graph the prediction I get a line through all of my data points as opposed to getting the actual confidence interval.
GunRate <- seq(0,100, length = 51)
LinearPredictionA <- predict(ModelA,
interval = "confidence",
newdata = data.frame(ProportionAdultsLivingWithGun = GunRate,
LogMedianIncome = FinalSet$LogMedianIncome,
PctofPeopleinMetro = FinalSet$PctofPeopleinMetro,
PovertyRate = FinalSet$PovertyRate))
##This is my prediction model
plot(x = FinalSet$ProportionAdultsLivingWithGun,
y = FinalSet$ViolentCrime1K,
col = "red",
xlim = c(0, 80), ylim = c(0, 15),
xlab ="Proportion of Adults Living With a Gun",
ylab = "Violent Crime Rate per 1000",
main = "Violent Crime vs. Gun Ownership",
sub = "All 50 States & D.C.")
## This plot shows the actual data we used to obtain the prediction
lines(GunRate, LinearPredictionA[, "fit"], type = "l")
lines(GunRate, LinearPredictionA[, "lwr"], lty = "dashed", col = "green")
lines(GunRate, LinearPredictionA[, "upr"], lty = "dashed", col = "green")
These line functions are supposed to graph my CI, but instead I get the following graph
Upvotes: 3
Views: 8009
Reputation: 93871
Here's an example of what's going wrong, using the built-in mtcars
data frame:
# Regression model
m1 = lm(mpg ~ wt + hp + cyl, data=mtcars)
Now let's get predictions of mpg
vs. wt
, but with 2 different alternating values of hp
and 3 different alternating values of cyl
:
predData = data.frame(wt=seq(1,5,length=60), hp=rep(c(200,300), 30), cyl=rep(c(4,6,8), 20))
predData = cbind(predData, predict(m1, newdata=predData, interval="confidence"))
Note how the prediction jumps around, because hp
and cyl
change for each successive value of wt
:
plot(predData$wt, predData$fit, type="l")
lines(predData$wt, predData$lwr, type="l", col="red")
lines(predData$wt, predData$upr, type="l", col="red")
But when we keep hp
and cyl
fixed, we get a straight line prediction for mpg
vs. wt
:
predData2 = data.frame(wt=seq(1,5,length=60), hp=rep(300,60), cyl=rep(6, 60))
predData2 = cbind(predData2, predict(m1, newdata=predData2, interval="confidence"))
plot(predData2$wt, predData2$fit, type="l")
lines(predData2$wt, predData2$lwr, type="l", col="red")
lines(predData2$wt, predData2$upr, type="l", col="red")
Instead of a single line, you can also plot predicted mpg vs. wt lines for several values of another variable. Below is an example where we plot a line for each value of cyl
that we used to create predData
. This is easier with ggplot2
so I've used that package. Using lines for the confidence intervals would make the plot difficult to understand, so I've shown the CI with a fill instead:
library(ggplot2)
ggplot(subset(predData, hp==200), aes(wt, fit, fill=factor(cyl), colour=factor(cyl))) +
geom_ribbon(aes(ymin=lwr, max=upr), alpha=0.2, colour=NA) +
geom_line() +
labs(x="Weight", y="Predicted MPG", colour="Cylinders", fill="Cylinders") +
theme_bw()
Upvotes: 3