Leprechault
Leprechault

Reputation: 1823

Poisson generalized linear mixed models (GLMMs): hard decision between lme4 and glmmADMB

Random coefficient Poisson models are rather difficult to fit, there tends to be some variability in parameter estimates between lme4 and glmmADMB. But in my case:

# Packages
library(lme4)
library(glmmADMB)

#Open my dataset

myds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_glmm_dataset.csv")
str(myds)

# 'data.frame': 526 obs. of  10 variables:
#  $ Bioma      : chr  "Pampa" "Pampa" "Pampa" "Pampa" ...
#  $ estacao    : chr  "verao" "verao" "verao" "verao" ...
#  $ ciclo      : chr  "1°" "1°" "1°" "1°" ...
#  $ Hour       : int  22 23 0 1 2 3 4 5 6 7 ...
#  $ anthill    : num  23.5 23.5 23.5 23.5 23.5 ...
#  $ formigueiro: int  2 2 2 2 2 2 2 2 2 2 ...
#  $ ladenant   : int  34 39 29 25 20 31 16 28 21 12 ...
#  $ unladen    : int  271 258 298 317 316 253 185 182 116 165 ...
#  $ UR         : num  65.7 69 71.3 75.8 78.1 ...
#  $ temp       : num  24.3 24.3 24 23.7 23.1 ...

I have a number of insects (ladenant) in the function of biome(Bioma), temperature (temp) and humidity(UR), but formiguieros is pseudoreplication. Then I try to model the relationship using lme4 and glmmADMB.

First I try lme4:

 m.laden.1  <- glmer(ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro), data = DataBase, family = poisson(link = "log"))
    summary(m.laden.1)
    # Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
    #  Family: poisson  ( log )
    # Formula: ladenant ~ Bioma + poly(temp, 2) + UR + (1 | formigueiro)
    #    Data: DataBase
    
    #      AIC      BIC   logLik deviance df.resid 
    #  21585.9  21615.8 -10786.0  21571.9      519 
    
    # Scaled residuals: 
    #     Min      1Q  Median      3Q     Max 
    # -10.607  -4.245  -1.976   2.906  38.242 
    
    # Random effects:
    #  Groups      Name        Variance Std.Dev.
    #  formigueiro (Intercept) 0.02049  0.1432  
    # Number of obs: 526, groups:  formigueiro, 5
    
    # Fixed effects:
    #                   Estimate Std. Error z value Pr(>|z|)    
    # (Intercept)      0.7379495  0.0976701   7.556 4.17e-14 ***
    # BiomaTransition  1.3978383  0.0209623  66.684  < 2e-16 ***
    # BiomaPampa      -0.1256759  0.0268164  -4.687 2.78e-06 ***
    # poly(temp, 2)1   7.1035195  0.2079550  34.159  < 2e-16 ***
    # poly(temp, 2)2  -7.2900687  0.2629908 -27.720  < 2e-16 ***
    # UR               0.0302810  0.0008029  37.717  < 2e-16 ***
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    # Correlation of Fixed Effects:
    #             (Intr) BmTrns BimPmp p(,2)1 p(,2)2
    # BiomaTrnstn -0.586                            
    # BiomaPampa  -0.199  0.352                     
    # ply(tmp,2)1 -0.208  0.267  0.312              
    # ply(tmp,2)2 -0.191  0.085 -0.175 -0.039       
    # UR          -0.746  0.709  0.188  0.230  0.316
    # optimizer (Nelder_Mead) convergence code: 0 (OK)
    # Model is nearly unidentifiable: very large eigenvalue
    #  - Rescale variables?
    

Second I try glmmADMB:

m.laden.2 <- glmmadmb(ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro), data = DataBase, family = "poisson", link = "log")
summary(m.laden.2)
# Call:
# glmmadmb(formula = ladenant ~ Bioma + poly(temp, 2) + UR + (1 | 
#     formigueiro), data = DataBase, family = "poisson", link = "log")

# AIC: 12033.9 

# Coefficients:
#                 Estimate Std. Error z value Pr(>|z|)    
# (Intercept)      1.52390    0.26923    5.66  1.5e-08 ***
# BiomaTransition  0.23967    0.08878    2.70   0.0069 ** 
# BiomaPampa       0.09680    0.05198    1.86   0.0626 .  
# poly(temp, 2)1  -0.38754    0.55678   -0.70   0.4864    
# poly(temp, 2)2  -1.16028    0.39608   -2.93   0.0034 ** 
# UR               0.01560    0.00261    5.97  2.4e-09 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Number of observations: total=526, formigueiro=5 
# Random effect variance(s):
# Group=formigueiro
#             Variance StdDev
# (Intercept)  0.07497 0.2738

Despite the between different statistical packages and estimations, the models are completely different significance levels in the case of Biome variable. My question is, where is any other approach that can be used to compare the results and choose a final model?

Thanks in advance.

Upvotes: 1

Views: 1791

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226192

I got a little bit carried away. tl;dr as pointed out in comments, it's hard to get glmmADMB to work with a Poisson model, but a model with overdispersion (e.g. negative binomial) is clearly a lot better. Furthermore, you should probably incorporate some aspect of random slopes in the model ...

Packages (colorblindr is optional):

library(lme4)
library(glmmADMB)
library(glmmTMB)
library(broom.mixed)
library(tidyverse) ## ggplot2, dplyr, tidyr, purrr ...
library(colorblindr) ## remotes::install_github("clauswilke/colorblindr")
theme_set(theme_bw())

Get data: standardize input variables so we can easily make a sensible coefficient plot

## read.csv doesn't work for me out of the box, locale/encoding issues
myds <- readr::read_csv("my_glmm_dataset.csv") %>%
  mutate(across(formigueiro, as.factor),
         across(c(UR, temp), ~ drop(scale(.))))

Formulas and models:

form <- ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro)
## random slopes, all independent (also tried with correlations (| instead
## of ||), but fails)
form_x <- ladenant ~ Bioma + poly(temp,2) + UR + (1 + UR + poly(temp,2) || formigueiro)

glmer_pois  <- glmer(form, data = myds, family = poisson(link = "log"))
## fails
glmmADMB_pois  <- try(glmmadmb(form, data = myds, family = "poisson")) 
## fails ("Parameters were estimated, but standard errors were not: 
## the most likely problem is that the curvature at MLE was zero or negative"
glmmTMB_pois  <- glmmTMB(form, data = myds, family = poisson)
glmer_nb2 <- glmer.nb(form, data = myds)
glmmADMB_nb2 <- glmmadmb(form, data = myds, family = "nbinom2")
glmmTMB_nb2 <- update(glmmTMB_pois, family = "nbinom2")
glmmTMB_nb1 <- update(glmmTMB_pois, family = "nbinom1")
glmmTMB_nb2ext <- update(glmmTMB_nb2, formula = form_x)

Put it all together:

modList <- tibble::lst(glmer_pois, glmmTMB_pois, glmer_nb2, glmmADMB_nb2, glmmTMB_nb2,
                       glmmTMB_nb1, glmmTMB_nb2ext)

bbmle::AICtab(modList)

                  dAIC    df
glmmTMB_nb2ext     0.0 11
glmer_nb2         27.0 8 
glmmADMB_nb2      27.1 8 
glmmTMB_nb2       27.1 8 
glmmTMB_nb1       79.5 8 
glmmTMB_pois   16658.0 7 
glmer_pois     16658.0 7 

The 'nb2' models are all OK, but the random-slopes model is considerably better.

Coefficient plots, including the fixed effects from all methods:

tt <- (purrr::map_dfr(modList, tidy, effects = "fixed", conf.int = TRUE,
                      .id = "model") |>
  dplyr::filter(term != "(Intercept)") |>
  tidyr::separate(model, into = c("platform", "distrib"))
)
ggplot(tt, aes(y = term, x = estimate, xmin = conf.low, xmax = conf.high,
               colour = platform, shape = distrib)) +
  geom_pointrange(position = position_dodge(width = 0.25)) +
  geom_vline(xintercept = 0, lty =2) +
  scale_colour_OkabeIto()

coefficient plot

Upvotes: 2

Related Questions