Li Sun
Li Sun

Reputation: 107

Why does the plot of conditional means (conditional mode) or random effects look like this?

I am fitting a random effects model with glmer from lme4 in R. The model looks OK to me.

enter image description here

My understanding is that the random effects come from a normal distribution with mean 0 and variance 1.632 (see above). So I was expecting the distribution of conditional means (or conditional modes, obtained by using getME(modelfit, 'b')) should more or less follow a bell curve. However, when I plot the histogram of the conditional means, I found it very strangel it looks like 2 separate distributions separated by 0. The plot is here:

enter image description here

The corresponding Q-Q plot of the conditional modes: enter image description here

Does anyone know what this means? Is there some strong confounder? Or can it just behave like this?

Upvotes: 2

Views: 1522

Answers (2)

Li Sun
Li Sun

Reputation: 107

Thank you @BenBolker for your help. I am writing what I finally concluded according to my trials and experiences, not exactly a answer buy just want to summarize a little bit. 1. Conditional means are predicts of normal random variables of the random effects. Histograms and QQplot of all conditional means from a specific data can basically be anything, actually in most realistic cases, it will not follow a bell curve. This totally depends on your data. Like in the example I posted above, we saw the two modes histogram because the fix effects are actually not helping a lot to predict, so random effects are 'dragged' into two modes to help the model reach optimal fit. To understand this, see https://github.com/rikku1983/Mixed-model/blob/master/diagnostic1.png. In this plot, x axis are values of on link level from only fixed effects, only random effects (conditional modes, because I have only one random effects) and both. The numbers in the plots are predictive power. 2. This leads to a natural question: does distribution of conditional modes need to follow a bell curve to satisfy assumptions? I don't think so because this is due to your data, in other words, if your data is not a good representation of a the population, even the model is close to truth, the conditional modes would not follow a normal distribution like the population. 3. This leads to a more general question of how should we diagnosing our generalized mixed model, test assumptions of normality and independence between and among random components? I did search on google but still have not found anything I think is really helpful. Any suggestions are welcome.

Again, all above is not guaranteed to be right. Just bring my understanding up for discuss if it is worthy it.

Upvotes: 0

Ben Bolker
Ben Bolker

Reputation: 226057

@RomanLustrik is correct to distinguish between the underlying assumption of Normality of the conditional mode and the estimates of the conditional modes themselves. The estimates need not be Normal; see ?qqmath.ranef.mer for diagnostic plots of the distribution of the conditional modes. If the distribution of your conditional modes is far from Normal, then you may indeed have a problem. Unfortunately, relaxing the assumption of Normality makes the modeling somewhat harder. You might, for example, be able to use a latent mixture model where you assume that the conditional modes are drawn from a mixture of two Normals - but I don't know offhand of an R package that implements this; if I were going to do it I would probably implement it using a toolbox like JAGS or Stan.

Before you go that direction, it's important to note that the characteristics of your data (approximately 2 Bernoulli observations per group) are such that the default Laplace approximation is expected to be very bad. Try nAGQ=10 (or even higher); it will slow your fitting considerably, but may improve the results.

Upvotes: 3

Related Questions