Reputation: 393
I'm trying to run zero-inflated negative binomial but I'm coming across a "NaNs produced" warning when checking the model that prevents me from seeing outcomes. Here's some mock data that's a simplified version of my actual data - my real data has many more observations per species + more species:
df1 <- data.frame(species = c("Rufl","Rufl","Rufl","Rufl","Assp","Assp","Assp","Assp","Elre", "Elre","Elre", "Elre","Soca","Soca","Soca","Soca"),
state = c("warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient"),
p_eaten = c(0, 0, 3, 0, 0, 1, 15, 0, 20, 0, 0, 2, 0, 3, 87, 0))
Here's the model I'm attempting to run, with an interaction between state and species:
library(pscl)
mod1 <- zeroinfl(p_eaten ~ state * species,
dist = 'negbin',
data = df1)
summary(mod1)
This is when I get Warning message: In sqrt(diag(object$vcov)) : NaNs produced
. How can I fix this warning message so that I'm able to see model outcomes? Thanks!
Using R version 4.0.2, Mac OS X 10.13.6
Upvotes: 1
Views: 13220
Reputation: 263451
This is pretty thin data for such a complex model but if you do an xtabs
on your version of the dataframe you see that one of your reference categories has zero counts. If you swap the levels of your state
variable , the NA's go away although some of the large standard errors remain.
xtabs(p_eaten~ state + species, data=df1)
species
state Assp Elre Rufl Soca
ambient 1 2 0 3
warmed 15 20 3 87
Uncleaned up console output follows:
df1 <- data.frame(species = c("Rufl","Rufl","Rufl","Rufl","Assp","Assp","Assp","Assp","Elre", "Elre","Elre", "Elre","Soca","Soca","Soca","Soca"),
+ state = factor(c("warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient","warmed","ambient"), levels=c("warmed","ambient")),
+ p_eaten = c(0, 0, 3, 0, 0, 1, 15, 0, 20, 0, 0, 2, 0, 3, 87, 0))
> xtabs(p_eaten~ state + species, data=df1)
species
state Assp Elre Rufl Soca
warmed 15 20 3 87
ambient 1 2 0 3
Attempt:
> library(pscl)
> mod1 <- zeroinfl(p_eaten ~ state * species,
+ dist = 'negbin',
+ data = df1)
> summary(mod1)
Call:
zeroinfl(formula = p_eaten ~ state * species, data = df1, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.98868 -0.80384 -0.00342 0.80387 0.98872
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.708e+00 2.582e-01 10.488 < 2e-16 ***
stateambient -3.401e+00 1.033e+00 -3.292 0.000994 ***
speciesElre 2.877e-01 3.416e-01 0.842 0.399623
speciesRufl -1.671e+00 6.874e-01 -2.431 0.015068 *
speciesSoca 1.758e+00 2.796e-01 6.288 3.22e-10 ***
stateambient:speciesElre 8.714e-01 1.400e+00 0.622 0.533627
stateambient:speciesRufl -3.972e-04 NA NA NA
stateambient:speciesSoca -2.763e-02 1.218e+00 -0.023 0.981906
Log(theta) 1.501e+01 1.848e+02 0.081 0.935234
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.327e-04 1.414e+00 0.000 1.000
stateambient -8.538e+00 1.206e+02 -0.071 0.944
speciesElre 1.379e-04 2.000e+00 0.000 1.000
speciesRufl -1.267e-01 2.083e+00 -0.061 0.952
speciesSoca 1.757e-04 2.000e+00 0.000 1.000
stateambient:speciesElre 8.016e+00 1.206e+02 0.066 0.947
stateambient:speciesRufl 1.757e+01 NA NA NA
stateambient:speciesSoca 8.411e+00 1.206e+02 0.070 0.944
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 3316847.5216
Number of iterations in BFGS optimization: 41
Log-likelihood: -21.87 on 17 Df
Warning message:
In sqrt(diag(object$vcov)) : NaNs produced
Variance-covariance matrices are supposed to be positive definite and negative values may preclude inversion efforts. When a similat question was posed on CrossValidated.com, Ben Bolker suggested using brglm2 version of glm:
> library(brglm2)
> summary(m1 <- glm(p_eaten~ state * species, data=df1,
+ family=poisson,
+ method="brglmFit"))
Call:
glm(formula = p_eaten ~ state * species, family = poisson, data = df1,
method = "brglmFit")
Deviance Residuals:
Min 1Q Median 3Q Max
-9.3541 -1.8708 -0.7071 0.8567 5.7542
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.0477 0.2540 8.062 7.52e-16 ***
stateambient -2.3354 0.8551 -2.731 0.00631 **
speciesElre 0.2796 0.3366 0.831 0.40619
speciesRufl -1.4881 0.5918 -2.514 0.01192 *
speciesSoca 1.7308 0.2756 6.281 3.37e-10 ***
stateambient:speciesElre 0.2312 1.0863 0.213 0.83142
stateambient:speciesRufl 0.3895 1.7369 0.224 0.82258
stateambient:speciesSoca -0.8835 1.0141 -0.871 0.38362
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 443.21 on 15 degrees of freedom
Residual deviance: 183.08 on 8 degrees of freedom
AIC: 225.38
Number of Fisher Scoring iterations: 1
In point of fact the need for an interaction seems questionable. The change in deviance with removal of the interaction terms is miniscule. Hard to know whether than mich also occur with your full dataset.
Upvotes: 1
Reputation: 226712
This is most likely a case of complete separation, although it's impossible to know for sure without your full data set.
This is likely to happen when you have categories that are all-zero, or all-nonzero. In the example you gave above:
with(df1,table(species,state,p_eaten==0))
shows that are no observations where species=="Rufl", state=="ambient", and p
==0 is FALSE
; in other words, all of the observations are zero for this combination of factors. Thus any coefficient that involves a comparison with this state will have a parameter value of large magnitude (i.e. abs(beta) >> 1); it should theoretically be infinite, but is usually somewhere between 10 and 30 (depending on where the numerical methods give up). These coefficients will either have ridiculously large standard errors and (Wald) confidence intervals, or (as in your case) NaN
values.
This description holds for the zero-inflation coefficients for speciesRufl
and statewarmed:speciesRufl
. The count-model coefficients are not large, but still have NaN
standard errors, I think because their uncertainties are related to the uncertainty of the zero-inflation coefficients.
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.69312 1.00016 -0.693 0.488307
...
speciesRufl -1.69427 NaN NaN NaN
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
...
speciesRufl 17.560 NaN NaN NaN
What can you do about this?
lmtest::lrtest()
brglm2
package), but I don't know if any of these are implemented/available for zeroinfl
models [you could do this, e.g., via the brms
package, but that would involve a lot of work getting up to speed with the foundations of the package]Upvotes: 1