Clustered Standard Error for Zero-Inflated Negative Binomial model

Question

I would like to compute the clustered standard errors for zero-inflated negative binomial model. By default, zeroinfl (from the pscl package) returns standard errors derived using the Hessian matrix returned by optim, e.g.:

library(pscl)
data("bioChemists", package = "pscl")
dim(bioChemists)
head(bioChemists)
## default start values
fm1 <- zeroinfl(art ~ ., data = bioChemists, dist = "negbin"))
summary(fm1)

Is there a way to use an asymmetrical/symmetrical distance matrix between observations OR use one of the variables (e.g. kid5 in the toy dataset) to compute clustered standard error?

I found this from an answer at stackexchange, but I am not sure how/whether it can be used with zero-inflated models. The equivalent in Stata's rzinb would probably be cluster clustvar under vce: http://www.stata.com/manuals13/rzinb.pdf .

Any ideas?

Achim Zeileis · Accepted Answer

The development version of the sandwich package on R-Forge has been extended to allow for object-oriented computation of clustered covariances. This also supports zero-inflated regression models. You can install the devel version from R-Forge via:

install.packages("sandwich", repos = "http://R-Forge.R-project.org")

And then load all required packages. The lmtest package is used for the coeftest() function into which covariance matrix estimates can be plugged in.

library("pscl")
library("sandwich")
library("lmtest")

The illustration model you used is the following.

data("bioChemists", package = "pscl")
fm1 <- zeroinfl(art ~ ., data = bioChemists, dist = "negbin")

The coeftest() function by default returns the same marginal Wald tests as summary().

coeftest(fm1)
## t test of coefficients:
## 
##                      Estimate  Std. Error t value  Pr(>|t|)    
## count_(Intercept)  0.41674653  0.14359655  2.9022  0.003796 ** 
## count_femWomen    -0.19550683  0.07559256 -2.5863  0.009856 ** 
## count_marMarried   0.09758263  0.08445195  1.1555  0.248199    
## count_kid5        -0.15173246  0.05420606 -2.7992  0.005233 ** 
## count_phd         -0.00070013  0.03626966 -0.0193  0.984603    
## count_ment         0.02478620  0.00349267  7.0966 2.587e-12 ***
## zero_(Intercept)  -0.19168829  1.32281889 -0.1449  0.884815    
## zero_femWomen      0.63593320  0.84891762  0.7491  0.453986    
## zero_marMarried   -1.49946849  0.93867060 -1.5974  0.110518    
## zero_kid5          0.62842720  0.44278263  1.4193  0.156166    
## zero_phd          -0.03771474  0.30800817 -0.1224  0.902572    
## zero_ment         -0.88229322  0.31622813 -2.7901  0.005381 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This can then be easily extended to employ a clustered covariance matrix estimate using the vcovCL() function. Here, the kid5 variable is employed as you suggested. (Note, if someone else is reading this: The usage of kid5 is just to show things "work" but does not really make sense in this application.)

coeftest(fm1, vcov = vcovCL(fm1, cluster = bioChemists$kid5))
## t test of coefficients:
## 
##                      Estimate  Std. Error  t value  Pr(>|t|)    
## count_(Intercept)  0.41674653  0.17009748   2.4500   0.01447 *  
## count_femWomen    -0.19550683  0.01701325 -11.4914 < 2.2e-16 ***
## count_marMarried   0.09758263  0.02401883   4.0628 5.272e-05 ***
## count_kid5        -0.15173246  0.03612916  -4.1997 2.938e-05 ***
## count_phd         -0.00070013  0.04852615  -0.0144   0.98849    
## count_ment         0.02478620  0.00263208   9.4170 < 2.2e-16 ***
## zero_(Intercept)  -0.19168829  0.51865043  -0.3696   0.71177    
## zero_femWomen      0.63593320  0.87775846   0.7245   0.46895    
## zero_marMarried   -1.49946849  1.03481783  -1.4490   0.14768    
## zero_kid5          0.62842720  0.35073624   1.7917   0.07351 .  
## zero_phd          -0.03771474  0.13873870  -0.2718   0.78581    
## zero_ment         -0.88229322  0.07481264 -11.7934 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Clustered Standard Error for Zero-Inflated Negative Binomial model

Answers (1)

Related Questions