Reputation: 57
I've trained a pretty complex (random intercept and slope) mixed logistic model which I'm then using to predict new data. My predictions were way off so I compared the predicted values for my original data and noticed that they are very different from my fitted.values()
. E.g. taking just the first data point, the fitted value is 0.359, the predicted value (predict(model, newdata=data, type="response"
) is 0.0585. I'm pretty sure the issue relates to the random effects, because if I predict the same data point with only fixed effects I get 0.707, which is what I'd expect, and random effects only gives 0.0252 which is very wrong.
You can see the different random effect groups by plotting fitted vs predicted (below) This also shows how the gradients within groups are the same so it looks like a problem with the intercepts rather than the slopes.
Any help would be really appreciated!
EDIT: In case it helps, the formula basically looks like this:
y ~ (1 | re1) +
(1 | re2) +
(1 | re3) +
fe1 + fe2 +
(1 + rs1 | re1) +
(1 + rs1 | re2) +
(1 | re4:re1) +
(1 | re4:re2) +
(1 | re5:re2) +
(1 | re5:re1) +
(1 + rs2 | re1) +
(1 + rs2 | re2) +
(1 + rs3 | re1) +
(1 + rs3 | re2)
EDIT 2: Here is a reprex. I tried this with a few different seeds and the difference varies quite a lot. E.g. seed 42 looks almost identical (but the results are not quite the same).
library(lme4)
set.seed(25)
y <- as.factor(round(runif(1000,0,1)))
re1 <- as.factor(round(runif(1000,1,5)))
re2 <- as.factor(round(runif(1000,1,4)))
re3 <- as.factor(round(runif(1000,0,1)))
fe1 <- runif(1000,0,1)
fe2 <- runif(1000,0,1)
rs1 <- runif(1000,0,1)
rs2 <- runif(1000,0,1)
df <- data.frame(y=y,
re1=re1,
re2=re2,
re3=re3,
fe1=fe1,
fe2=fe2,
rs1=rs1,
rs2=rs2)
model <- glmer(y ~ (1 | re1) +
(1 | re2) +
(1 | re3) +
fe1 + fe2 +
(1 + rs1 | re1) +
(1 + rs1 | re2), family="binomial", data=df)
plot(fitted.values(model), predict(model, newdata=df, type="response"))
Upvotes: 2
Views: 1553
Reputation: 6812
The model:
glmer(y ~ (1 | re1) +
(1 | re2) +
(1 | re3) +
fe1 + fe2 +
(1 + rs1 | re1) +
(1 + rs1 | re2), family="binomial", data=df)
is pathological. (1 | re1)
and (1 | re2)
are redundant since you also fit (1 + rs1 | re1)
and (1 + rs1 | re2)
. If the intention was to estimate uncorrelated random effects then you can do so using (1 | re1) + (0 + rs1 | re1)
and (1 | re2) + (0 + rs1 | re2)
If you remove these redundant terms:
model <- glmer(y ~ (1 | re3) +
fe1 + fe2 +
(1 + rs1 | re1) +
(1 + rs1 | re2), family="binomial", data=df)
then you obtain the same values from fitted.values
and predict
:
plot(fitted.values(model), predict(model, newdata=df, type="response"))
This model is singular - and so is the original one - but that is a different problem.
Upvotes: 1