Convergence issues glmer: How to interpret allFit outcomes and comparing models with different optimizers

Question

I am running a binomial logistic regression using lme4, which is constructed as follows:

m2_DL <- glmer(Dest ~ Pb_on * Pb_type + Dest_tree + ZF_Dest + (1|Site), data = DF_DL_3, family = binomial(link = "logit"))

All variables are factors save for ZF_Dest, which is numerical. The full model (m1) includes one extra numerical variable and works fine. This second model however fails to converge with warning message:

fixed-effect model matrix is rank deficient so dropping 6 columns / coefficients
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00144561 (tol = 0.001, component 1)

The rank deficiency is expected in my case and (I assume) not relevant to the problem here. After looking online I found that I could try to add the following argument to my glmer: control = glmerControl(optimizer = "bobyqa"), which indeed resolves the issue.

However, I wished to better understand what this function did and if I could still compare my models if only my second model has this extra argument. This brought me to this question on SO. One of the comments on this question states: "you did not use a different optimizer (bobyqa is the default for glmer)". But if this is the case then how come adding the argument resolved the warning, as I did not increase the number of iterations as is sometimes also done?

From the answer on that question I understood that if the difference of the likelihoods of both models is close I indicate a 'false positive' of the convergence error, which again is the case for my models. I also read help(converge) as recommended and used the allFit function indicated there. This outputs:

bobyqa : fixed-effect model matrix is rank deficient so dropping 6 columns / coefficients
[OK]
Nelder_Mead : fixed-effect model matrix is rank deficient so dropping 6 columns / coefficients
[OK]
nlminbwrap : fixed-effect model matrix is rank deficient so dropping 6 columns / coefficients
[OK]
nmkbw : fixed-effect model matrix is rank deficient so dropping 6 columns / coefficients
[OK]
optimx.L-BFGS-B : fixed-effect model matrix is rank deficient so dropping 6 columns / coefficients
[OK]
nloptwrap.NLOPT_LN_NELDERMEAD : fixed-effect model matrix is rank deficient so dropping 6 columns / coefficients
[OK]
nloptwrap.NLOPT_LN_BOBYQA : fixed-effect model matrix is rank deficient so dropping 6 columns / coefficients
[OK]
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.0013785 (tol = 0.001, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00904036 (tol = 0.001, component 1)
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00904036 (tol = 0.001, component 1)

and the object containing the results of allFit contains:

original model:
Dest ~ Pb_on * Pb_type + Dest_tree + ZF_Dest + (1 | Site) 
data:  DF_DL_3 
optimizers (7): bobyqa, Nelder_Mead, nlminbwrap, nmkbw, optimx.L-BFGS-B, nloptwrap.NLOPT_LN_N...
1 optimizer(s) failed
differences in negative log-likelihoods:
max= 6.65e-05 ; std dev= 3.41e-05

Here you can see that 1 optimizer failed and I get several warnings when using allFit. Now I understand that not all optimizers are made equal and some won't work for certain data. The problem is that I don't understand how to interpret the outcome of this allFit function, and I am not sure which optimizer failed and if I should be worried. So in summary, my questions:

How could the BOBYQA optimizer resolve the issue if this is indeed the default optimizer?
How should I interpret the outcome of allFit.
What is the best way to resolve the warning for the second model. Is simply adding control = glmerControl(optimizer = "bobyqa") suitable?
If I add this argument to this model, can I then still fairly compare this model to others? (Note: I use the anova function in order to compare models).

Thanks

EDIT 6-3-2020: As per Roland's request here is the summary of the allFit object:

$which.OK
                       bobyqa                   Nelder_Mead                    nlminbwrap                         nmkbw 
                         TRUE                          TRUE                          TRUE                          TRUE 
              optimx.L-BFGS-B nloptwrap.NLOPT_LN_NELDERMEAD     nloptwrap.NLOPT_LN_BOBYQA 
                        FALSE                          TRUE                          TRUE 

$msgs
$msgs$bobyqa
NULL

$msgs$Nelder_Mead
NULL

$msgs$nlminbwrap
NULL

$msgs$nmkbw
[1] "Model failed to converge with max|grad| = 0.0013785 (tol = 0.001, component 1)"

$msgs$nloptwrap.NLOPT_LN_NELDERMEAD
[1] "Model failed to converge with max|grad| = 0.00904036 (tol = 0.001, component 1)"

$msgs$nloptwrap.NLOPT_LN_BOBYQA
[1] "Model failed to converge with max|grad| = 0.00904036 (tol = 0.001, component 1)"


$fixef
                              (Intercept)    Pb_onL    Pb_onN     Pb_onS Pb_typeNG Pb_typeSong Dest_treeL    ZF_Dest
bobyqa                          0.4421569 0.5633434 0.1788602 -0.2046165 0.1777216  -0.1115736  0.5845617 0.01578428
Nelder_Mead                     0.4420709 0.5633071 0.1788488 -0.2046262 0.1777301  -0.1115980  0.5845859 0.01578563
nlminbwrap                      0.4421020 0.5633365 0.1788603 -0.2046102 0.1777225  -0.1115746  0.5845682 0.01578510
nmkbw                           0.4423417 0.5635447 0.1791150 -0.2045143 0.1781367  -0.1114576  0.5844313 0.01578639
nloptwrap.NLOPT_LN_NELDERMEAD   0.4434285 0.5612392 0.1765273 -0.2063090 0.1773619  -0.1140108  0.5852130 0.01579032
nloptwrap.NLOPT_LN_BOBYQA       0.4434285 0.5612392 0.1765273 -0.2063090 0.1773619  -0.1140108  0.5852130 0.01579032
                              Pb_onL:Pb_typeNG Pb_onN:Pb_typeNG Pb_onL:Pb_typeSong Pb_onN:Pb_typeSong
bobyqa                               -1.140386       -0.1635209         -0.5446501         -0.2497627
Nelder_Mead                          -1.140380       -0.1635345         -0.5445915         -0.2497404
nlminbwrap                           -1.140389       -0.1635264         -0.5446236         -0.2497616
nmkbw                                -1.140884       -0.1640907         -0.5448729         -0.2500731
nloptwrap.NLOPT_LN_NELDERMEAD        -1.138619       -0.1624441         -0.5413400         -0.2461967
nloptwrap.NLOPT_LN_BOBYQA            -1.138619       -0.1624441         -0.5413400         -0.2461967

$llik
                       bobyqa                   Nelder_Mead                    nlminbwrap                         nmkbw 
                    -826.3143                     -826.3143                     -826.3143                     -826.3143 
nloptwrap.NLOPT_LN_NELDERMEAD     nloptwrap.NLOPT_LN_BOBYQA 
                    -826.3143                     -826.3143 

$sdcor
                              Site.(Intercept)
bobyqa                                1.537530
Nelder_Mead                           1.537517
nlminbwrap                            1.537501
nmkbw                                 1.537453
nloptwrap.NLOPT_LN_NELDERMEAD         1.538042
nloptwrap.NLOPT_LN_BOBYQA             1.538042

$theta
                              Site.(Intercept)
bobyqa                                1.537530
Nelder_Mead                           1.537517
nlminbwrap                            1.537501
nmkbw                                 1.537453
nloptwrap.NLOPT_LN_NELDERMEAD         1.538042
nloptwrap.NLOPT_LN_BOBYQA             1.538042

$times
                              user.self sys.self elapsed user.child sys.child
bobyqa                             6.51        0    6.55         NA        NA
Nelder_Mead                        3.06        0    3.06         NA        NA
nlminbwrap                         2.22        0    2.22         NA        NA
nmkbw                              2.17        0    2.17         NA        NA
nloptwrap.NLOPT_LN_NELDERMEAD      0.94        0    0.94         NA        NA
nloptwrap.NLOPT_LN_BOBYQA          0.86        0    0.86         NA        NA

$feval
                       bobyqa                   Nelder_Mead                    nlminbwrap                         nmkbw 
                         3098                          1274                            NA                           672 
nloptwrap.NLOPT_LN_NELDERMEAD     nloptwrap.NLOPT_LN_BOBYQA 
                           97                            97 

attr(,"class")
[1] "summary.allFit"

Using car::vif(full_model) to check for collinearity results in:

> car::vif(m1_DD)
                   GVIF Df GVIF^(1/(2*Df))
Pb_on         17.606990  3        1.612925
Pb_type       20.180212  2        2.119490
ZF_Dest        1.015237  1        1.007589
ZF_Other       1.006738  1        1.003364
Pb_on:Pb_type 96.910527  4        1.771317

As I am not 100% sure how to interpret these, I used this link which suggests I shouldn't be too worried. As one answer states "The rule of GVIF(1/(2×Df))<2 is applied in some publications, which would equal to an ordinary VIF of 4 for one-coefficient variables". So Pb_type might form a problem, but I have often seen up to a VIF of 10 being seen as "acceptable" so I think this is fine.

EDIT 10-3-2020 After some more searching (now that I know what the bobyqa really does, thanks to meriops!) I came across, this thread. Reading the answer there, I believe that my answers are now as follows:

As per what meriops said: "glmer uses a two-stage optimization by default, with bobyqa for the first stage and Nelder-Mead for the second stage. So when you change the optimizer to bobyqa, you use bobyqa for both stages."
Based on Ben Bolker's answer in this thread, and my own likelihoods differing <0.1, it might mean that the warning was a false positive to begin with. For now I still choose to adjust the optimizers.
Based on the aforementioned thread I am now under the assumption that this is indeed fine.
Similarly, the aforementioned thread suggests this not to be an issue in my case. As my models are indeed nested.

If any of you wish to elaborate your comments or adjust them into full answers then please do so, I will then accept them.

Convergence issues glmer: How to interpret allFit outcomes and comparing models with different optimizers

Answers (1)

Related Questions