Reputation: 43
Why are these GLMMs so different?
Both are made with lme4, both use the same data, but one is framed in terms of successes and trials (m1bin) while one just uses the raw accuracy data (m1). Have I been completely mistaken thinking that lme4 figures out the binomial structure from the raw data this whole time? (BRMS does it just fine.) I'm scared, now, that some of my analyses will change.
d:
uniqueid dim incorrectlabel accuracy
1 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental marginal 0
2 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental extreme 1
3 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 relevant marginal 1
4 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental marginal 1
5 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 relevant marginal 0
6 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental marginal 0
dbin:
uniqueid dim incorrectlabel right count
<fctr> <fctr> <fctr> <int> <int>
1 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental extreme 3 3
2 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental marginal 1 5
3 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 relevant extreme 3 4
4 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 relevant marginal 3 4
5 A16HSMUJ7C7QA7:3DY46V3X3PI4B0HROD2HN770M46557 incidental extreme 3 4
6 A16HSMUJ7C7QA7:3DY46V3X3PI4B0HROD2HN770M46557 incidental marginal 2 4
> summary(m1bin)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(right, count) ~ dim * incorrectlabel + (1 | uniqueid)
Data: dbin
AIC BIC logLik deviance df.resid
398.2 413.5 -194.1 388.2 151
Scaled residuals:
Min 1Q Median 3Q Max
-1.50329 -0.53743 0.08671 0.38922 1.28887
Random effects:
Groups Name Variance Std.Dev.
uniqueid (Intercept) 0 0
Number of obs: 156, groups: uniqueid, 39
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.48460 0.13788 -3.515 0.00044 ***
dimrelevant -0.13021 0.20029 -0.650 0.51562
incorrectlabelmarginal -0.15266 0.18875 -0.809 0.41863
dimrelevant:incorrectlabelmarginal -0.02664 0.27365 -0.097 0.92244
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) dmrlvn incrrc
dimrelevant -0.688
incrrctlblm -0.730 0.503
dmrlvnt:ncr 0.504 -0.732 -0.690
> summary(m1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: accuracy ~ dim * incorrectlabel + (1 | uniqueid)
Data: d
AIC BIC logLik deviance df.resid
864.0 886.2 -427.0 854.0 619
Scaled residuals:
Min 1Q Median 3Q Max
-1.3532 -1.0336 0.7524 0.9350 1.1514
Random effects:
Groups Name Variance Std.Dev.
uniqueid (Intercept) 0.04163 0.204
Number of obs: 624, groups: uniqueid, 39
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.140946 0.088242 1.597 0.1102
dim1 0.155923 0.081987 1.902 0.0572 .
incorrectlabel1 0.180156 0.081994 2.197 0.0280 *
dim1:incorrectlabel1 0.001397 0.082042 0.017 0.9864
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) dim1 incrr1
dim1 0.010
incrrctlbl1 0.128 0.006
dm1:ncrrct1 0.005 0.138 0.010
I figured they'd be the same. Modeling both in BRMS gives the same models with the same estimates.
Upvotes: 2
Views: 670
Reputation: 226382
They should be the same (up to small numerical differences: see below), except for the log-likelihoods and metric based on them (although differences among a series of models in log-likelihoods/AIC/etc. should be the same). I think your problem is using cbind(right, count)
rather than cbind(right, count-right)
: from ?glm
,
For binomial ... families the response can also be specified as ... a two-column matrix with the columns giving the numbers of successes and failures.
(emphasis added to point out this is not number of successes and total, but successes and failures).
Here's an example with one of the built-in data sets, comparing fits to an aggregated and a disaggregated data set:
library(lme4)
library(dplyr)
## disaggregate
cbpp_disagg <- cbpp %>% mutate(obs=seq(nrow(cbpp))) %>%
group_by(obs,herd,period,incidence) %>%
do(data.frame(disease=rep(c(0,1),c(.$size-.$incidence,.$incidence))))
nrow(cbpp_disagg) == sum(cbpp$size) ## check
g1 <- glmer(cbind(incidence,size-incidence)~period+(1|herd),
family=binomial,cbpp)
g2 <- glmer(disease~period+(1|herd),
family=binomial,cbpp_disagg)
## compare results
all.equal(fixef(g1),fixef(g2),tol=1e-5)
all.equal(VarCorr(g1),VarCorr(g2),tol=1e-6)
Upvotes: 5