Sara Lily
Sara Lily

Reputation: 43

LME4 GLMMs are different when constructed as success | trials vs raw data?

Why are these GLMMs so different?

Both are made with lme4, both use the same data, but one is framed in terms of successes and trials (m1bin) while one just uses the raw accuracy data (m1). Have I been completely mistaken thinking that lme4 figures out the binomial structure from the raw data this whole time? (BRMS does it just fine.) I'm scared, now, that some of my analyses will change.

d:
                                                                             uniqueid        dim incorrectlabel accuracy
1 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental       marginal        0
2 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental        extreme        1
3 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8   relevant       marginal        1
4 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental       marginal        1
5 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8   relevant       marginal        0
6 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental       marginal        0

dbin:
                                                                             uniqueid        dim incorrectlabel right count
                                                                                 <fctr>     <fctr>         <fctr> <int> <int>
1 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental        extreme     3     3
2 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8 incidental       marginal     1     5
3 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8   relevant        extreme     3     4
4 A10LVHTF26QHQC:3X4MXAO0BGONT6U9HL2TG8P9YNBRW8   relevant       marginal     3     4
5 A16HSMUJ7C7QA7:3DY46V3X3PI4B0HROD2HN770M46557 incidental        extreme     3     4
6 A16HSMUJ7C7QA7:3DY46V3X3PI4B0HROD2HN770M46557 incidental       marginal     2     4
> summary(m1bin)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: cbind(right, count) ~ dim * incorrectlabel + (1 | uniqueid)
     Data: dbin

         AIC      BIC   logLik deviance df.resid 
     398.2    413.5   -194.1    388.2      151 

Scaled residuals: 
         Min       1Q   Median       3Q      Max 
-1.50329 -0.53743  0.08671  0.38922  1.28887 

Random effects:
 Groups   Name        Variance Std.Dev.
 uniqueid (Intercept) 0        0       
Number of obs: 156, groups:  uniqueid, 39

Fixed effects:
                                                                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)                        -0.48460    0.13788  -3.515  0.00044 ***
dimrelevant                        -0.13021    0.20029  -0.650  0.51562    
incorrectlabelmarginal             -0.15266    0.18875  -0.809  0.41863    
dimrelevant:incorrectlabelmarginal -0.02664    0.27365  -0.097  0.92244    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
                        (Intr) dmrlvn incrrc
dimrelevant -0.688              
incrrctlblm -0.730  0.503       
dmrlvnt:ncr  0.504 -0.732 -0.690
> summary(m1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: accuracy ~ dim * incorrectlabel + (1 | uniqueid)
     Data: d

         AIC      BIC   logLik deviance df.resid 
     864.0    886.2   -427.0    854.0      619 

Scaled residuals: 
        Min      1Q  Median      3Q     Max 
-1.3532 -1.0336  0.7524  0.9350  1.1514 

Random effects:
 Groups   Name        Variance Std.Dev.
 uniqueid (Intercept) 0.04163  0.204   
Number of obs: 624, groups:  uniqueid, 39

Fixed effects:
                                         Estimate Std. Error z value Pr(>|z|)  
(Intercept)          0.140946   0.088242   1.597   0.1102  
dim1                 0.155923   0.081987   1.902   0.0572 .
incorrectlabel1      0.180156   0.081994   2.197   0.0280 *
dim1:incorrectlabel1 0.001397   0.082042   0.017   0.9864  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
                        (Intr) dim1  incrr1
dim1        0.010              
incrrctlbl1 0.128  0.006       
dm1:ncrrct1 0.005  0.138 0.010 

I figured they'd be the same. Modeling both in BRMS gives the same models with the same estimates.

Upvotes: 2

Views: 670

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226382

They should be the same (up to small numerical differences: see below), except for the log-likelihoods and metric based on them (although differences among a series of models in log-likelihoods/AIC/etc. should be the same). I think your problem is using cbind(right, count) rather than cbind(right, count-right): from ?glm,

For binomial ... families the response can also be specified as ... a two-column matrix with the columns giving the numbers of successes and failures.

(emphasis added to point out this is not number of successes and total, but successes and failures).

Here's an example with one of the built-in data sets, comparing fits to an aggregated and a disaggregated data set:

library(lme4)
library(dplyr)

## disaggregate
cbpp_disagg <- cbpp %>% mutate(obs=seq(nrow(cbpp))) %>%
  group_by(obs,herd,period,incidence) %>%
  do(data.frame(disease=rep(c(0,1),c(.$size-.$incidence,.$incidence))))

nrow(cbpp_disagg) == sum(cbpp$size)  ## check

g1 <- glmer(cbind(incidence,size-incidence)~period+(1|herd),
            family=binomial,cbpp)
g2 <- glmer(disease~period+(1|herd),
            family=binomial,cbpp_disagg)

## compare results
all.equal(fixef(g1),fixef(g2),tol=1e-5)
all.equal(VarCorr(g1),VarCorr(g2),tol=1e-6)

Upvotes: 5

Related Questions