why do ggplot2 95%CI and prediction 95%CI calculated manually differ?

Question

I'd like to know why when calculating 95% confidence bands from a linear mixed effects model does ggplot2 produces narrower bands than when calculated manually, e.g. by following Ben Bolker's method here confidence intervals on predictions. That is, is ggplot2 giving an inaccurate representation of the model?

Here is a reproducible example using the sleepstudy dataset (modified to be structurally similar to a df that I'm working on):

data("sleepstudy") # load dataset 
height <- seq(165, 185, length.out = 18) # create vector called height
Treatment <- rep(c("Control", "Drug"), 9) # create vector called treatment
Subject <- levels(sleepstudy$Subject) # get vector of Subject
ht.subject <- data.frame(height, Subject, Treatment) 
sleepstudy <- dplyr::left_join(sleepstudy, ht.subject, by="Subject") # Append df so that each subject has its own height and treatment
sleepstudy$Treatment <- as.factor(sleepstudy$Treatment)

Generate model, add predictions to original df, and plot

m.sleep <- lmer(Reaction ~ Treatment*height + (1 + Days|Subject), data=sleepstudy)
sleepstudy$pred <- predict(m.sleep)
ggplot(sleepstudy, aes(height, pred, col=Treatment)) + geom_smooth(method="lm")[2]

Calculate confidence intervals following Bolker method

newdf <- expand.grid(height=seq(165, 185, 1),
                   Treatment=c("Control","Drug"))
newdf$Reaction <- predict(m.sleep, newdf, re.form=NA) 
modmat <- model.matrix(terms(m.sleep), newdf)
pvar1 <- diag(modmat %*% tcrossprod(vcov(m.sleep), modmat))
tvar1 <- pvar1+VarCorr(m.sleep)$Subject[1]
cmult <- 1.96

newdf <- data.frame(newdf
,plo = newdf$Reaction-cmult*sqrt(pvar1)
,phi = newdf$Reaction+cmult*sqrt(pvar1)
,tlo = newdf$Reaction-cmult*sqrt(tvar1)
,thi = newdf$Reaction+cmult*sqrt(tvar1))

# plot confidence intervals
ggplot(newdf, aes(x=height, y=Reaction, colour=Treatment)) + 
geom_point() +
geom_ribbon(aes(ymin=plo, ymax=phi, fill=Treatment), alpha=0.4)[2]

Ben Bolker · Accepted Answer

With a few tweaks, this seems consistent. The confidence intervals are indeed larger, but not enormously much larger. Keep in mind that ggplot is fitting a very different model; it is fitting separate linear (not linear mixed) models by treatment that ignore (1) repeated measures and (2) the effect of day.

It seems weird to fit a model with random slopes but no population-level slope (e.g.see here), so I added a fixed effect of Days:

m.sleep <- lmer(Reaction ~ Treatment*height + Days +
                (1 + Days|Subject),
                data=sleepstudy)

I reorganized the plotting code a little bit:

theme_set(theme_bw())
gg0 <- ggplot(sleepstudy, aes(height, colour=Treatment)) +
    geom_point(aes(y=Reaction))+
    geom_smooth(aes(y=pred), method="lm")

If you want to compute confidence intervals (which would be comparable with what lm()/ggplot2 is doing), then you probably should not add VarCorr(m.sleep)$Subject[1] to the variance (the tvar1 variable from the FAQ example is for creating prediction intervals rather than confidence intervals ...)
since I had Days in the model above, I added mean(sleepstudy$Days) to the prediction data frame.

newdf <- expand.grid(height=seq(165, 185, 1),
                     Treatment=c("Control","Drug"),
                     Days=mean(sleepstudy$Days))
newdf$Reaction <- newdf$pred <- predict(m.sleep, newdf, re.form=NA) 
modmat <- model.matrix(terms(m.sleep), newdf)
pvar1 <- diag(modmat %*% tcrossprod(vcov(m.sleep), modmat))
tvar1 <- pvar1
cmult <- 1.96

newdf <- data.frame(newdf
,plo = newdf$Reaction-cmult*sqrt(pvar1)
,phi = newdf$Reaction+cmult*sqrt(pvar1)
,tlo = newdf$Reaction-cmult*sqrt(tvar1)
,thi = newdf$Reaction+cmult*sqrt(tvar1))

gg0 + 
    geom_point(data=newdf,aes(y=Reaction)) +
    geom_ribbon(data=newdf,
                aes(ymin=plo, ymax=phi, fill=Treatment), alpha=0.4,
                colour=NA)

Comparing with the estimated slopes and standard errors:

m0 <- lm(Reaction~height*Treatment,sleepstudy)
ff <- function(m) {
    print(coef(summary(m))[-1,c("Estimate","Std. Error")],digits=2)
}

> ff(m0)
##                      Estimate Std. Error
## height                   -0.3       0.94
## TreatmentDrug          -602.2     234.01
## height:TreatmentDrug      3.5       1.34

ff(m.sleep)
##                      Estimate Std. Error
## TreatmentDrug          -55.03      425.3
## height                   0.41        1.7
## Days                    10.47        1.5
## TreatmentDrug:height     0.33        2.4

This looks consistent/about right: the mixed model is giving larger standard errors for the slope with respect to height and the height:treatment interaction. (The main effects of TreatmentDrug look crazy because they're the expected effects of treatment at height==0 ...)

As a cross-check, I can get similar answers from sjPlot::plot_model() ...

library(sjPlot)
plot_model(m.sleep, type="pred", terms=c("height","Treatment"))

why do ggplot2 95%CI and prediction 95%CI calculated manually differ?

Answers (1)

Related Questions