zeroinfl model - Warning message: In sqrt(diag(object$vcov)) : NaNs produced

Question

I'm trying to run zero-inflated negative binomial but I'm coming across a "NaNs produced" warning when checking the model that prevents me from seeing outcomes. Here's some mock data that's a simplified version of my actual data - my real data has many more observations per species + more species:

df1 <- data.frame(species = c("Rufl","Rufl","Rufl","Rufl","Assp","Assp","Assp","Assp","Elre", "Elre","Elre", "Elre","Soca","Soca","Soca","Soca"),
                  state = c("warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient"),
                  p_eaten = c(0, 0, 3, 0, 0, 1, 15, 0, 20, 0, 0, 2, 0, 3, 87, 0))

Here's the model I'm attempting to run, with an interaction between state and species:

library(pscl)
mod1 <- zeroinfl(p_eaten ~ state * species,
                     dist = 'negbin',
                     data = df1)
summary(mod1)

This is when I get Warning message: In sqrt(diag(object$vcov)) : NaNs produced. How can I fix this warning message so that I'm able to see model outcomes? Thanks!

Using R version 4.0.2, Mac OS X 10.13.6

Ben Bolker · Accepted Answer

This is most likely a case of complete separation, although it's impossible to know for sure without your full data set.

This is likely to happen when you have categories that are all-zero, or all-nonzero. In the example you gave above:

with(df1,table(species,state,p_eaten==0))

shows that are no observations where species=="Rufl", state=="ambient", and p==0 is FALSE; in other words, all of the observations are zero for this combination of factors. Thus any coefficient that involves a comparison with this state will have a parameter value of large magnitude (i.e. abs(beta) >> 1); it should theoretically be infinite, but is usually somewhere between 10 and 30 (depending on where the numerical methods give up). These coefficients will either have ridiculously large standard errors and (Wald) confidence intervals, or (as in your case) NaN values.

This description holds for the zero-inflation coefficients for speciesRufl and statewarmed:speciesRufl. The count-model coefficients are not large, but still have NaN standard errors, I think because their uncertainties are related to the uncertainty of the zero-inflation coefficients.

Count model coefficients (negbin with log link):
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -0.69312    1.00016  -0.693 0.488307    
...
speciesRufl             -1.69427        NaN     NaN      NaN    

Zero-inflation model coefficients (binomial with logit link):
                        Estimate Std. Error z value Pr(>|z|)
...
speciesRufl               17.560        NaN     NaN      NaN

What can you do about this?

Ignore the problem. You won't be able to get reliable standard errors (and hence p-values, confidence intervals, etc.) for the coefficients that are affected by the problem, but the model is still OK in principle. Doing likelihood ratio tests (comparing the log-likelihood of models with vs. without specified sets of predictors) is still OK and can be used to get p-values for specified effects, e.g. via lmtest::lrtest()
Simplify your model: lump some categories, decide whether you really need zero-inflation, etc.
There are a variety of other approaches that involve penalization or imposing Bayesian priors on the relevant coefficients to keep them sensible (e.g. the brglm2 package), but I don't know if any of these are implemented/available for zeroinfl models [you could do this, e.g., via the brms package, but that would involve a lot of work getting up to speed with the foundations of the package]

zeroinfl model - Warning message: In sqrt(diag(object$vcov)) : NaNs produced

Answers (2)

Related Questions