Reputation: 43
I'm new to R and I wanted to ask about how to obtain a confidence interval for the difference between two predicted values estimated using predict() function. For example,
data("cars", package = "datasets")
model <- lm(dist ~ speed, data = cars)
summary(model)
prediction1 = predict(model, newdata = list(speed = 12), interval = "confidence")
prediction1
prediction2 = predict(model, newdata = list(speed = 20), interval = "confidence")
prediction2
predict.dif = prediction2 - prediction1
predict.dif
what I got is
prediction1 = 29.60981 (lwr)24.39514 (upr)34.82448
prediction2 = 61.06908 (lwr)55.24729 (upr)66.89088
and for the difference in prediction I get
predict.diff = 31.45927 (lwr)30.85215 (upr)32.06639
My question is, in the previous example, I get the difference in predictions and R also subtracts the confidence intervals. Is it correct to subtract CIs this way? and why? and if not, I was wondering if there is a way to calculate such CI for the difference in predictions.
Thank you so much
Upvotes: 0
Views: 955
Reputation: 1495
You pose a good question. In summary, we should not compute confidence intervals separately and then take the differences between their respective lower bounds and respective upper bounds.
Why?
To explain why, rather than getting too much into the math, I'll illustrate via an example:
Let's take your code snippet and just change prediction2
to be a prediction when speed equals 13:
data("cars", package = "datasets")
model <- lm(dist ~ speed, data = cars)
summary(model)
prediction1 = predict(model, newdata = list(speed = 12), interval = "confidence")
prediction1
prediction2 = predict(model, newdata = list(speed = 13), interval = "confidence")
prediction2
predict.dif = prediction2 - prediction1
predict.dif
In this case your output would be:
fit lwr upr
1 3.932409 4.336198 3.52862
But question yourself as to what we're doing here. You have a model of the form dist = a + b*speed
. You then compute the model predictions at speed = 12
and speed = 13
and take the difference. So that gives (a + b*13) - (a + b*12) = b
. Therefore, the confidence interval for the difference should be equal to the confidence interval for my parameter b
. Let's check whether this is the case:
confint(model)
2.5 % 97.5 %
(Intercept) -31.167850 -3.990340
speed 3.096964 4.767853
You can see that they are not equal. In your approach, the interval is (4.336198, 3.52862)
vs the second approach where the interval is (3.096964, 4.767853)
If I find a good/easy to follow reference on the math behind this, I will link in the comments.
Solution
You can leverage the contrast
package for this purpose. Using this package, we can call:
contrast(model, a = list(speed = 13), b = list(speed = 12))
and this would output the required confidence interval for the difference:
lm model parameter contrast
Contrast S.E. Lower Upper t df Pr(>|t|)
1 3.932409 0.4155128 3.096964 4.767853 9.46 48 0
Getting back to your example, we can do
contrast(model, a = list(speed = 20), b = list(speed = 12))
to get
lm model parameter contrast
Contrast S.E. Lower Upper t df Pr(>|t|)
1 31.45927 3.324102 24.77571 38.14283 9.46 48 0
Upvotes: 1