Reputation: 112
I'd like to know why when calculating 95% confidence bands from a linear mixed effects model does ggplot2 produces narrower bands than when calculated manually, e.g. by following Ben Bolker's method here confidence intervals on predictions. That is, is ggplot2 giving an inaccurate representation of the model?
Here is a reproducible example using the sleepstudy dataset (modified to be structurally similar to a df that I'm working on):
data("sleepstudy") # load dataset
height <- seq(165, 185, length.out = 18) # create vector called height
Treatment <- rep(c("Control", "Drug"), 9) # create vector called treatment
Subject <- levels(sleepstudy$Subject) # get vector of Subject
ht.subject <- data.frame(height, Subject, Treatment)
sleepstudy <- dplyr::left_join(sleepstudy, ht.subject, by="Subject") # Append df so that each subject has its own height and treatment
sleepstudy$Treatment <- as.factor(sleepstudy$Treatment)
Generate model, add predictions to original df, and plot
m.sleep <- lmer(Reaction ~ Treatment*height + (1 + Days|Subject), data=sleepstudy)
sleepstudy$pred <- predict(m.sleep)
ggplot(sleepstudy, aes(height, pred, col=Treatment)) + geom_smooth(method="lm")[2]
Calculate confidence intervals following Bolker method
newdf <- expand.grid(height=seq(165, 185, 1),
Treatment=c("Control","Drug"))
newdf$Reaction <- predict(m.sleep, newdf, re.form=NA)
modmat <- model.matrix(terms(m.sleep), newdf)
pvar1 <- diag(modmat %*% tcrossprod(vcov(m.sleep), modmat))
tvar1 <- pvar1+VarCorr(m.sleep)$Subject[1]
cmult <- 1.96
newdf <- data.frame(newdf
,plo = newdf$Reaction-cmult*sqrt(pvar1)
,phi = newdf$Reaction+cmult*sqrt(pvar1)
,tlo = newdf$Reaction-cmult*sqrt(tvar1)
,thi = newdf$Reaction+cmult*sqrt(tvar1))
# plot confidence intervals
ggplot(newdf, aes(x=height, y=Reaction, colour=Treatment)) +
geom_point() +
geom_ribbon(aes(ymin=plo, ymax=phi, fill=Treatment), alpha=0.4)[2]
Upvotes: 4
Views: 595
Reputation: 226192
With a few tweaks, this seems consistent. The confidence intervals are indeed larger, but not enormously much larger. Keep in mind that ggplot is fitting a very different model; it is fitting separate linear (not linear mixed) models by treatment that ignore (1) repeated measures and (2) the effect of day.
It seems weird to fit a model with random slopes but no population-level slope (e.g.see here), so I added a fixed effect of Days
:
m.sleep <- lmer(Reaction ~ Treatment*height + Days +
(1 + Days|Subject),
data=sleepstudy)
I reorganized the plotting code a little bit:
theme_set(theme_bw())
gg0 <- ggplot(sleepstudy, aes(height, colour=Treatment)) +
geom_point(aes(y=Reaction))+
geom_smooth(aes(y=pred), method="lm")
lm()
/ggplot2
is doing), then you probably should not add VarCorr(m.sleep)$Subject[1]
to the variance (the tvar1
variable from the FAQ example is for creating prediction intervals rather than confidence intervals ...)Days
in the model above, I added mean(sleepstudy$Days)
to the prediction data frame.newdf <- expand.grid(height=seq(165, 185, 1),
Treatment=c("Control","Drug"),
Days=mean(sleepstudy$Days))
newdf$Reaction <- newdf$pred <- predict(m.sleep, newdf, re.form=NA)
modmat <- model.matrix(terms(m.sleep), newdf)
pvar1 <- diag(modmat %*% tcrossprod(vcov(m.sleep), modmat))
tvar1 <- pvar1
cmult <- 1.96
newdf <- data.frame(newdf
,plo = newdf$Reaction-cmult*sqrt(pvar1)
,phi = newdf$Reaction+cmult*sqrt(pvar1)
,tlo = newdf$Reaction-cmult*sqrt(tvar1)
,thi = newdf$Reaction+cmult*sqrt(tvar1))
gg0 +
geom_point(data=newdf,aes(y=Reaction)) +
geom_ribbon(data=newdf,
aes(ymin=plo, ymax=phi, fill=Treatment), alpha=0.4,
colour=NA)
Comparing with the estimated slopes and standard errors:
m0 <- lm(Reaction~height*Treatment,sleepstudy)
ff <- function(m) {
print(coef(summary(m))[-1,c("Estimate","Std. Error")],digits=2)
}
> ff(m0)
## Estimate Std. Error
## height -0.3 0.94
## TreatmentDrug -602.2 234.01
## height:TreatmentDrug 3.5 1.34
ff(m.sleep)
## Estimate Std. Error
## TreatmentDrug -55.03 425.3
## height 0.41 1.7
## Days 10.47 1.5
## TreatmentDrug:height 0.33 2.4
This looks consistent/about right: the mixed model is giving larger standard errors for the slope with respect to height and the height:treatment interaction. (The main effects of TreatmentDrug
look crazy because they're the expected effects of treatment at height==0
...)
As a cross-check, I can get similar answers from sjPlot::plot_model()
...
library(sjPlot)
plot_model(m.sleep, type="pred", terms=c("height","Treatment"))
Upvotes: 4