Reputation: 107
I am fitting a random effects model with glmer
from lme4
in R. The model looks OK to me.
My understanding is that the random effects come from a normal distribution with mean 0 and variance 1.632 (see above). So I was expecting the distribution of conditional means (or conditional modes, obtained by using getME(modelfit, 'b')
) should more or less follow a bell curve.
However, when I plot the histogram of the conditional means, I found it very strangel it looks like 2 separate distributions separated by 0. The plot is here:
The corresponding Q-Q plot of the conditional modes:
Does anyone know what this means? Is there some strong confounder? Or can it just behave like this?
Upvotes: 2
Views: 1522
Reputation: 107
Thank you @BenBolker for your help. I am writing what I finally concluded according to my trials and experiences, not exactly a answer buy just want to summarize a little bit. 1. Conditional means are predicts of normal random variables of the random effects. Histograms and QQplot of all conditional means from a specific data can basically be anything, actually in most realistic cases, it will not follow a bell curve. This totally depends on your data. Like in the example I posted above, we saw the two modes histogram because the fix effects are actually not helping a lot to predict, so random effects are 'dragged' into two modes to help the model reach optimal fit. To understand this, see https://github.com/rikku1983/Mixed-model/blob/master/diagnostic1.png. In this plot, x axis are values of on link level from only fixed effects, only random effects (conditional modes, because I have only one random effects) and both. The numbers in the plots are predictive power. 2. This leads to a natural question: does distribution of conditional modes need to follow a bell curve to satisfy assumptions? I don't think so because this is due to your data, in other words, if your data is not a good representation of a the population, even the model is close to truth, the conditional modes would not follow a normal distribution like the population. 3. This leads to a more general question of how should we diagnosing our generalized mixed model, test assumptions of normality and independence between and among random components? I did search on google but still have not found anything I think is really helpful. Any suggestions are welcome.
Again, all above is not guaranteed to be right. Just bring my understanding up for discuss if it is worthy it.
Upvotes: 0
Reputation: 226057
@RomanLustrik is correct to distinguish between the underlying assumption of Normality of the conditional mode and the estimates of the conditional modes themselves. The estimates need not be Normal; see ?qqmath.ranef.mer
for diagnostic plots of the distribution of the conditional modes. If the distribution of your conditional modes is far from Normal, then you may indeed have a problem. Unfortunately, relaxing the assumption of Normality makes the modeling somewhat harder. You might, for example, be able to use a latent mixture model where you assume that the conditional modes are drawn from a mixture of two Normals - but I don't know offhand of an R package that implements this; if I were going to do it I would probably implement it using a toolbox like JAGS or Stan.
Before you go that direction, it's important to note that the characteristics of your data (approximately 2 Bernoulli observations per group) are such that the default Laplace approximation is expected to be very bad. Try nAGQ=10
(or even higher); it will slow your fitting considerably, but may improve the results.
Upvotes: 3