GLMER model produces different values for fitted.values() and predict() on same data

Question

I've trained a pretty complex (random intercept and slope) mixed logistic model which I'm then using to predict new data. My predictions were way off so I compared the predicted values for my original data and noticed that they are very different from my fitted.values(). E.g. taking just the first data point, the fitted value is 0.359, the predicted value (predict(model, newdata=data, type="response") is 0.0585. I'm pretty sure the issue relates to the random effects, because if I predict the same data point with only fixed effects I get 0.707, which is what I'd expect, and random effects only gives 0.0252 which is very wrong.

You can see the different random effect groups by plotting fitted vs predicted (below) This also shows how the gradients within groups are the same so it looks like a problem with the intercepts rather than the slopes.

Any help would be really appreciated!

EDIT: In case it helps, the formula basically looks like this:

y ~ (1 | re1) + 
    (1 | re2) + 
    (1 | re3) +  
    fe1 + fe2 + 
    (1 + rs1 |  re1) + 
    (1 + rs1 | re2) + 
    (1 |  re4:re1) + 
    (1 | re4:re2) + 
    (1 |  re5:re2) + 
    (1 | re5:re1) + 
    (1 +  rs2 | re1) + 
    (1 + rs2 |  re2) + 
    (1 + rs3 | re1) + 
    (1 +  rs3 | re2)

EDIT 2: Here is a reprex. I tried this with a few different seeds and the difference varies quite a lot. E.g. seed 42 looks almost identical (but the results are not quite the same).


library(lme4)

set.seed(25)

y <- as.factor(round(runif(1000,0,1)))

re1 <- as.factor(round(runif(1000,1,5)))
re2 <- as.factor(round(runif(1000,1,4)))
re3 <- as.factor(round(runif(1000,0,1)))

fe1 <- runif(1000,0,1)
fe2 <- runif(1000,0,1)

rs1 <- runif(1000,0,1)
rs2 <- runif(1000,0,1)

df <- data.frame(y=y,
                 re1=re1,
                 re2=re2,
                 re3=re3,
                 fe1=fe1,
                 fe2=fe2,
                 rs1=rs1,
                 rs2=rs2)

model <- glmer(y ~ (1 | re1) + 
                 (1 | re2) + 
                 (1 | re3) +  
                 fe1 + fe2 + 
                 (1 + rs1 |  re1) + 
                 (1 + rs1 | re2), family="binomial", data=df)

plot(fitted.values(model), predict(model, newdata=df, type="response"))

Robert Long · Accepted Answer

The model:

glmer(y ~ (1 | re1) + 
                   (1 | re2) + 
                   (1 | re3) +  
                   fe1 + fe2 + 
                   (1 + rs1 |  re1) + 
                   (1 + rs1 | re2), family="binomial", data=df)

If you remove these redundant terms:

  model <- glmer(y ~ (1 | re3) +  
                   fe1 + fe2 + 
                   (1 + rs1 |  re1) + 
                   (1 + rs1 | re2), family="binomial", data=df)

then you obtain the same values from fitted.values and predict:

plot(fitted.values(model), predict(model, newdata=df, type="response"))

This model is singular - and so is the original one - but that is a different problem.

GLMER model produces different values for fitted.values() and predict() on same data

Answers (1)

Related Questions