glm.nb non-convergence and other options for negative binomial regression

Question

I'm trying to implement a glm with a negative binomial distribution in R and have a few questions. Here are the data I'm working with - note that my predictors are all centred and scaled using 2 standard deviations.

df <- structure(list(offset_var = c(1.28599085563929, 0.56011798945881, 
0.910539373132146, 1.40150012655384, 1.99332290786881, 0.431352036569322, 
1.9664387276399, 0.39501204122542, 0.837952086514098, 1.49309174747163, 
0.185176379335315, 0.340780160418832, 0.23261768703236, 1.9358463846208, 
0.683228994093252, 0.766384544663353, 2.36951602293399, 0.570222733062088, 
2.035225147398, 2.66190465894079, 2.59441609615926, 2.45851192989865, 
9.10131475225225, 1.79085721952421, 1.14341199635633, 1.86464965904907, 
0.969684569635742, 1.48586083003075, 0.398395739771472, 1.57819408350658, 
0.348242838034098, 1.01779798183851, 1.56020949397414, 1.14211413937976, 
1.15472189286642, 2.17372502761175, 2.17372502761175, 1.15472189286642, 
0.91480376034087, 0.52428786649856, 0.663761139444733, 1.85426680323653, 
0.799109081286816, 1.2592920835499, 1.32010595330908, 1.9473416892704, 
1.74079702185659, 1.92731761020335), response_var = c(31, 154, 
40, 71, 3, 43, 66, 180, 62, 27, 84, 81, 66, 14, 40, 35, 48, 155, 
23, 37, 33, 88, 38, 8, 73, 29, 11, 12, 74, 70, 298, 40, 24, 35, 
0, 27, 0, 116, 83, 60, 77, 68, 56, 8, 64, 234, 57, 71), pred1 = c(0.593253968253968, 
0.212619047619048, 0.208888888888889, 0.824603174603175, 0.791031746031746, 
0.096984126984127, 0.589523809523809, 0.096984126984127, 0.783571428571429, 
0.156666666666667, 0, 0.361904761904762, 0.164126984126984, 0.723809523809524, 
0.123095238095238, 0.671587301587302, 0.708888888888889, 0.186507936507937, 
0.77984126984127, 0.0820634920634921, 0.220079365079365, 1, 0.27984126984127, 
0.835793650793651, 0.526031746031746, 0.0298412698412698, 0.641746031746032, 
0.0783333333333333, 0.156666666666667, 0.791031746031746, 0.167857142857143, 
0.641746031746032, 0.817142857142857, 0.126825396825397, 0.160396825396825, 
0.776031746031746, 0.776031746031746, 0.160396825396825, 0.130555555555556, 
0.600714285714286, 0.123095238095238, 0.70515873015873, 0.164126984126984, 
0.492460317460317, 0.0932539682539683, 0.596984126984127, 0.134285714285714, 
0.899206349206349), pred3 = c(0.779445727482678, 0.779445727482678, 
0.981524249422633, 0.981524249422633, 0.431293302540416, 0.431293302540416, 
0.85796766743649, 0.85796766743649, 0.974018475750578, 0.974018475750578, 
0.215357967667437, 0.215357967667437, 0.767898383371826, 0.767898383371826, 
0.929561200923788, 0.929561200923788, 0.319861431870669, 0.319861431870669, 
0.930138568129331, 0.930138568129331, 1, 1, 0.753464203233256, 
0.753464203233256, 0.937644341801385, 0.937644341801385, 0.438799076212472, 
0.438799076212472, 0.960161662817552, 0.960161662817552, 0.852193995381064, 
0.852193995381064, 0.803117782909932, 0.803117782909932, 0.571593533487299, 
0.571593533487299, 0.571593533487299, 0.571593533487299, 0, 0, 
0.801385681293304, 0.801385681293304, 0.656466512702079, 0.656466512702079, 
0.852771362586606, 0.852771362586606, 0.203810623556583, 0.203810623556583
), pred6 = c(0.688053824835413, 0.540952466204905, 0.628399305213738, 
0.6881894451542, 0.780506864063453, 0.596192469087064, 0.664351792193705, 
0.601630037911551, 0.725643181043039, 0.617634868802872, 0.811784880984014, 
0.804339479465123, 0.576349468957153, 0.847678636187369, 0.625702671539389, 
0.719554647550091, 0.363763777714112, 0.446477292582477, 0.474674252997289, 
0.480035185899025, 0.4937661725622, 0, 0.513379111687292, 0.903418101749096, 
0.695930304479286, 0.617335579129072, 0.759590008590574, 0.654198263624173, 
0.638351745840282, 0.758157124611394, 0.616963963666818, 0.749652691590418, 
0.626922408721354, 0.587580912470206, 0.430636110291636, 0.349235608213707, 
0.349235608213707, 0.430636110291636, 0.723455668440276, 1, 0.615287637567703, 
0.696660949537828, 0.570881476208878, 0.51486263616898, 0.599650096633513, 
0.661291018750272, 0.44995045321675, 0.317335721700076), pred2 = c(0.0640028225107189, 
0.0640028225107189, 1, 1, 0.0467149443570715, 0.0467149443570715, 
0.955928169494161, 0.955928169494161, 0.991149754525286, 0.991149754525286, 
0.636541945977623, 0.636541945977623, 0.0695282460368243, 0.0695282460368243, 
0.945044759518499, 0.945044759518499, 0.666620820800469, 0.666620820800469, 
0.865751343981534, 0.865751343981534, 0.158030700783964, 0.158030700783964, 
0, 0, 0.941696017987526, 0.941696017987526, 0.194526003575978, 
0.194526003575978, 0.974286448958601, 0.974286448958601, 0.182153599598151, 
0.182153599598151, 0.834117696305023, 0.834117696305023, 0.729828317197582, 
0.729828317197582, 0.729828317197582, 0.729828317197582, 0.477488683047594, 
0.477488683047594, 0.914726688872012, 0.914726688872012, 0.0551346373492319, 
0.0551346373492319, 0.950905057197701, 0.950905057197701, 0.653584648412039, 
0.653584648412039), pred4 = c(0.521197167238095, 0.521197167238095, 
0.675165075565519, 0.675165075565519, 0.754750741994489, 0.754750741994489, 
0.656214157700682, 0.656214157700682, 0.779025156922262, 0.779025156922262, 
0.466656174770789, 0.466656174770789, 1, 1, 0.19412522736884, 
0.19412522736884, 0, 0, 0.914127451104004, 0.914127451104004, 
0.450341983837599, 0.450341983837599, 0.200288723846985, 0.200288723846985, 
0.51864603534987, 0.51864603534987, 0.632983879774839, 0.632983879774839, 
0.219950187016786, 0.219950187016786, 0.378371968866928, 0.378371968866928, 
0.332552996854781, 0.332552996854781, 0.543573394076179, 0.543573394076179, 
0.543573394076179, 0.543573394076179, 0.110505257947263, 0.110505257947263, 
0.80446053190608, 0.80446053190608, 0.942739062050578, 0.942739062050578, 
0.769130578963835, 0.769130578963835, 0.910376608809312, 0.910376608809312
), pred5 = c(1, 1, 0.277346201087263, 0.277346201087263, 0.977777461589862, 
0.977777461589862, 0.690563309054654, 0.690563309054654, 0.439008770827643, 
0.439008770827643, 0.819995545023047, 0.819995545023047, 0.579039846892373, 
0.579039846892373, 0.365538938318013, 0.365538938318013, 0.0971685384896833, 
0.0971685384896833, 0.599718772091419, 0.599718772091419, 0.871837464542416, 
0.871837464542416, 0.0550210151699806, 0.0550210151699806, 0.740706048133834, 
0.740706048133834, 0.0826796777514591, 0.0826796777514591, 0.52020534667614, 
0.52020534667614, 0, 0, 0.660904950317592, 0.660904950317592, 
0.790027991312289, 0.790027991312289, 0.790027991312289, 0.790027991312289, 
0.206354045553633, 0.206354045553633, 0.600235537123976, 0.600235537123976, 
0.829243121962118, 0.829243121962118, 0.595320097430252, 0.595320097430252, 
0.17138558268267, 0.17138558268267)), row.names = c(NA, -48L), class = "data.frame")

First off, I tried running the model using the glm.nb function in the MASS package, but kept getting non-convergence warnings (glm.fit: algorithm did not converge) even after increasing the number of iterations beyond the default 25 (I tried 50, 100, 250, 1000, and even 5000):

> glm1 <- glm.nb(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, control = glm.control(maxit = 5000))
There were 50 or more warnings (use warnings() to see the first 50) #these are the non-convergence warnings
> summary(glm1)

Call:
glm.nb(formula = response_var ~ pred1 * pred2 + pred1 * pred3 + 
    pred1 * pred4 + pred1 * pred5 + pred6 + offset(log(offset_var)), 
    data = df, control = glm.control(maxit = 5000), init.theta = 0.6779942761, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8944  -0.9971  -0.4541   0.2178   1.9029  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   7.3937     1.2057   6.132 8.65e-10 ***
pred1        -5.2640     2.0354  -2.586   0.0097 ** 
pred2        -0.9040     0.8356  -1.082   0.2793    
pred3        -0.6444     1.0470  -0.616   0.5382    
pred4        -1.1811     1.1321  -1.043   0.2968    
pred5        -0.2583     1.0225  -0.253   0.8006    
pred6        -0.6567     1.0222  -0.642   0.5206    
pred1:pred2   2.0800     1.5741   1.321   0.1864    
pred1:pred3   1.7663     2.0589   0.858   0.3910    
pred1:pred4   0.6586     2.1055   0.313   0.7544    
pred1:pred5   0.4524     2.0225   0.224   0.8230    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.791) family taken to be 1)

    Null deviance: 68.201  on 47  degrees of freedom
Residual deviance: 57.762  on 37  degrees of freedom
AIC: 541.63

Number of Fisher Scoring iterations: 5000


              Theta:  0.678 
          Std. Err.:  0.124 
Warning while fitting theta: alternation limit reached 

 2 x log-likelihood:  -517.634

I know I am trying to fit a complex model given my sample size, so I am assuming this is why the model is not converging (I want to fit such a complex model because I am going to use it as the global model upon which to perform model averaging). However, I then tried alternative packages that can fit glms with negative binomial distributions in R. When I tried glmmTMB and brglm2the models converged and I got the following (very similar) results:

library(glmmTMB)
library(brglm2)

> #fit model using glmmTMB
> glm2 <- glmmTMB(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, family = nbinom2)
> summary(glm2)
 Family: nbinom2  ( log )
Formula:          response_var ~ pred1 * pred2 + pred1 * pred3 + pred1 * pred4 +      pred1 * pred5 + pred6 + offset(log(offset_var))
Data: df

     AIC      BIC   logLik deviance df.resid 
   538.8    561.3   -257.4    514.8       36 


Dispersion parameter for nbinom2 family (): 0.815 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.09772    1.36899   5.185 2.16e-07 ***
pred1       -5.42650    2.23604  -2.427   0.0152 *  
pred2       -1.10202    0.79743  -1.382   0.1670    
pred3       -0.39245    1.14313  -0.343   0.7314    
pred4       -1.33843    1.32065  -1.013   0.3108    
pred5       -0.07125    1.09954  -0.065   0.9483    
pred6       -0.17567    1.03227  -0.170   0.8649    
pred1:pred2  2.80613    1.69120   1.659   0.0971 .  
pred1:pred3  1.04996    2.04619   0.513   0.6079    
pred1:pred4  1.15544    2.37031   0.487   0.6259    
pred1:pred5 -0.10083    2.31516  -0.044   0.9653    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> #fit model using brglm2
> glm_3 <- brnb(response_var~pred1*pred2 + pred1*pred3 + pred1*pred4 + pred1*pred5 +  pred6 + offset(log(offset_var)), data = df, link = "log", type = "ML", transformation = "identity")
> summary(glm_3)

Call:
brnb(formula = response_var ~ pred1 * pred2 + pred1 * pred3 + pred1 * pred4 + pred1 * pred5 + pred6 + offset(log(offset_var)), data = df, link = "log", type = "ML", 
    transformation = "identity")


Coefficients (mean model with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.09773    1.18882   5.970 2.37e-09 ***
pred1       -5.42651    2.00939  -2.701  0.00692 ** 
pred2       -1.10203    0.82354  -1.338  0.18084    
pred3       -0.39243    1.03189  -0.380  0.70372    
pred4       -1.33854    1.11593  -1.199  0.23034    
pred5       -0.07116    1.00813  -0.071  0.94373    
pred6       -0.17567    1.00911  -0.174  0.86180    
pred1:pred2  2.80608    1.55187   1.808  0.07058 .  
pred1:pred3  1.05001    2.03198   0.517  0.60534    
pred1:pred4  1.15563    2.07750   0.556  0.57803    
pred1:pred5 -0.10104    1.99736  -0.051  0.95965    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter  with identity transformation function: 1.2267971)

    Null deviance: 81.222  on 47  degrees of freedom
Residual deviance: 57.153  on 37  degrees of freedom
AIC: 538.825

Type of estimator: ML (maximum likelihood)
Number of quasi-Fisher scoring iterations:32

My questions are:

Why would the model not converge using glm.nb but converge using the other two packages? Should I be suspicious of the results from glmmTMB or brglm2 given the issues with convergence in using glm.nb?
glmmTMB was by far the fastest so I'd like to use that package going forward. However I am wondering: given that glmmTMB is a package for generalized linear mixed modelling, is there any issue with fitting a model in glmmTMB without random effects? Or would the only tradeoff be the slightly higher standard errors for the coefficients fit using glmmTMB compared with those from brglm2?

glm.nb non-convergence and other options for negative binomial regression

Answers (1)

Related Questions