millie0725
millie0725

Reputation: 393

zeroinfl model - Warning message: In sqrt(diag(object$vcov)) : NaNs produced

I'm trying to run zero-inflated negative binomial but I'm coming across a "NaNs produced" warning when checking the model that prevents me from seeing outcomes. Here's some mock data that's a simplified version of my actual data - my real data has many more observations per species + more species:

df1 <- data.frame(species = c("Rufl","Rufl","Rufl","Rufl","Assp","Assp","Assp","Assp","Elre", "Elre","Elre", "Elre","Soca","Soca","Soca","Soca"),
                  state = c("warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient"),
                  p_eaten = c(0, 0, 3, 0, 0, 1, 15, 0, 20, 0, 0, 2, 0, 3, 87, 0))

Here's the model I'm attempting to run, with an interaction between state and species:

library(pscl)
mod1 <- zeroinfl(p_eaten ~ state * species,
                     dist = 'negbin',
                     data = df1)
summary(mod1)

This is when I get Warning message: In sqrt(diag(object$vcov)) : NaNs produced. How can I fix this warning message so that I'm able to see model outcomes? Thanks!

Using R version 4.0.2, Mac OS X 10.13.6

Upvotes: 1

Views: 13220

Answers (2)

IRTFM
IRTFM

Reputation: 263451

This is pretty thin data for such a complex model but if you do an xtabs on your version of the dataframe you see that one of your reference categories has zero counts. If you swap the levels of your state variable , the NA's go away although some of the large standard errors remain.

xtabs(p_eaten~ state + species, data=df1)
         species
state     Assp Elre Rufl Soca
  ambient    1    2    0    3
  warmed    15   20    3   87

Uncleaned up console output follows:

df1 <- data.frame(species = c("Rufl","Rufl","Rufl","Rufl","Assp","Assp","Assp","Assp","Elre", "Elre","Elre", "Elre","Soca","Soca","Soca","Soca"),
+                   state = factor(c("warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient"), levels=c("warmed","ambient")),
+                   p_eaten = c(0, 0, 3, 0, 0, 1, 15, 0, 20, 0, 0, 2, 0, 3, 87, 0))
> xtabs(p_eaten~ state + species, data=df1)
         species
state     Assp Elre Rufl Soca
  warmed    15   20    3   87
  ambient    1    2    0    3

Attempt:

> library(pscl)
> mod1 <- zeroinfl(p_eaten ~ state * species,
+                      dist = 'negbin',
+                      data = df1)
> summary(mod1)

Call:
zeroinfl(formula = p_eaten ~ state * species, data = df1, dist = "negbin")

Pearson residuals:
     Min       1Q   Median       3Q      Max 
-0.98868 -0.80384 -0.00342  0.80387  0.98872 

Count model coefficients (negbin with log link):
                           Estimate Std. Error z value Pr(>|z|)    
(Intercept)               2.708e+00  2.582e-01  10.488  < 2e-16 ***
stateambient             -3.401e+00  1.033e+00  -3.292 0.000994 ***
speciesElre               2.877e-01  3.416e-01   0.842 0.399623    
speciesRufl              -1.671e+00  6.874e-01  -2.431 0.015068 *  
speciesSoca               1.758e+00  2.796e-01   6.288 3.22e-10 ***
stateambient:speciesElre  8.714e-01  1.400e+00   0.622 0.533627    
stateambient:speciesRufl -3.972e-04         NA      NA       NA    
stateambient:speciesSoca -2.763e-02  1.218e+00  -0.023 0.981906    
Log(theta)                1.501e+01  1.848e+02   0.081 0.935234    

Zero-inflation model coefficients (binomial with logit link):
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)              -1.327e-04  1.414e+00   0.000    1.000
stateambient             -8.538e+00  1.206e+02  -0.071    0.944
speciesElre               1.379e-04  2.000e+00   0.000    1.000
speciesRufl              -1.267e-01  2.083e+00  -0.061    0.952
speciesSoca               1.757e-04  2.000e+00   0.000    1.000
stateambient:speciesElre  8.016e+00  1.206e+02   0.066    0.947
stateambient:speciesRufl  1.757e+01         NA      NA       NA
stateambient:speciesSoca  8.411e+00  1.206e+02   0.070    0.944
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 3316847.5216 
Number of iterations in BFGS optimization: 41 
Log-likelihood: -21.87 on 17 Df
Warning message:
In sqrt(diag(object$vcov)) : NaNs produced

Variance-covariance matrices are supposed to be positive definite and negative values may preclude inversion efforts. When a similat question was posed on CrossValidated.com, Ben Bolker suggested using brglm2 version of glm:

> library(brglm2)
> summary(m1 <- glm(p_eaten~ state * species, data=df1,
+             family=poisson,
+             method="brglmFit"))

Call:
glm(formula = p_eaten ~ state * species, family = poisson, data = df1, 
    method = "brglmFit")

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-9.3541  -1.8708  -0.7071   0.8567   5.7542  

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)                2.0477     0.2540   8.062 7.52e-16 ***
stateambient              -2.3354     0.8551  -2.731  0.00631 ** 
speciesElre                0.2796     0.3366   0.831  0.40619    
speciesRufl               -1.4881     0.5918  -2.514  0.01192 *  
speciesSoca                1.7308     0.2756   6.281 3.37e-10 ***
stateambient:speciesElre   0.2312     1.0863   0.213  0.83142    
stateambient:speciesRufl   0.3895     1.7369   0.224  0.82258    
stateambient:speciesSoca  -0.8835     1.0141  -0.871  0.38362    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 443.21  on 15  degrees of freedom
Residual deviance: 183.08  on  8  degrees of freedom
AIC: 225.38

Number of Fisher Scoring iterations: 1

In point of fact the need for an interaction seems questionable. The change in deviance with removal of the interaction terms is miniscule. Hard to know whether than mich also occur with your full dataset.

Upvotes: 1

Ben Bolker
Ben Bolker

Reputation: 226712

This is most likely a case of complete separation, although it's impossible to know for sure without your full data set.

This is likely to happen when you have categories that are all-zero, or all-nonzero. In the example you gave above:

with(df1,table(species,state,p_eaten==0))

shows that are no observations where species=="Rufl", state=="ambient", and p==0 is FALSE; in other words, all of the observations are zero for this combination of factors. Thus any coefficient that involves a comparison with this state will have a parameter value of large magnitude (i.e. abs(beta) >> 1); it should theoretically be infinite, but is usually somewhere between 10 and 30 (depending on where the numerical methods give up). These coefficients will either have ridiculously large standard errors and (Wald) confidence intervals, or (as in your case) NaN values.

This description holds for the zero-inflation coefficients for speciesRufl and statewarmed:speciesRufl. The count-model coefficients are not large, but still have NaN standard errors, I think because their uncertainties are related to the uncertainty of the zero-inflation coefficients.

Count model coefficients (negbin with log link):
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -0.69312    1.00016  -0.693 0.488307    
...
speciesRufl             -1.69427        NaN     NaN      NaN    

Zero-inflation model coefficients (binomial with logit link):
                        Estimate Std. Error z value Pr(>|z|)
...
speciesRufl               17.560        NaN     NaN      NaN

What can you do about this?

  • Ignore the problem. You won't be able to get reliable standard errors (and hence p-values, confidence intervals, etc.) for the coefficients that are affected by the problem, but the model is still OK in principle. Doing likelihood ratio tests (comparing the log-likelihood of models with vs. without specified sets of predictors) is still OK and can be used to get p-values for specified effects, e.g. via lmtest::lrtest()
  • Simplify your model: lump some categories, decide whether you really need zero-inflation, etc.
  • There are a variety of other approaches that involve penalization or imposing Bayesian priors on the relevant coefficients to keep them sensible (e.g. the brglm2 package), but I don't know if any of these are implemented/available for zeroinfl models [you could do this, e.g., via the brms package, but that would involve a lot of work getting up to speed with the foundations of the package]

Upvotes: 1

Related Questions