Reputation: 133
I would like to compute the clustered standard errors for zero-inflated negative binomial model. By default, zeroinfl
(from the pscl
package) returns standard errors derived using the Hessian matrix returned by optim
, e.g.:
library(pscl)
data("bioChemists", package = "pscl")
dim(bioChemists)
head(bioChemists)
## default start values
fm1 <- zeroinfl(art ~ ., data = bioChemists, dist = "negbin"))
summary(fm1)
Is there a way to use an asymmetrical/symmetrical distance matrix between observations OR use one of the variables (e.g. kid5
in the toy dataset) to compute clustered standard error?
I found this from an answer at stackexchange, but I am not sure how/whether it can be used with zero-inflated models. The equivalent in Stata's rzinb
would probably be cluster clustvar
under vce
: http://www.stata.com/manuals13/rzinb.pdf .
Any ideas?
Upvotes: 2
Views: 2481
Reputation: 17183
The development version of the sandwich
package on R-Forge has been extended to allow for object-oriented computation of clustered covariances. This also supports zero-inflated regression models. You can install the devel version from R-Forge via:
install.packages("sandwich", repos = "http://R-Forge.R-project.org")
And then load all required packages. The lmtest
package is used for the coeftest()
function into which covariance matrix estimates can be plugged in.
library("pscl")
library("sandwich")
library("lmtest")
The illustration model you used is the following.
data("bioChemists", package = "pscl")
fm1 <- zeroinfl(art ~ ., data = bioChemists, dist = "negbin")
The coeftest()
function by default returns the same marginal Wald tests as summary()
.
coeftest(fm1)
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## count_(Intercept) 0.41674653 0.14359655 2.9022 0.003796 **
## count_femWomen -0.19550683 0.07559256 -2.5863 0.009856 **
## count_marMarried 0.09758263 0.08445195 1.1555 0.248199
## count_kid5 -0.15173246 0.05420606 -2.7992 0.005233 **
## count_phd -0.00070013 0.03626966 -0.0193 0.984603
## count_ment 0.02478620 0.00349267 7.0966 2.587e-12 ***
## zero_(Intercept) -0.19168829 1.32281889 -0.1449 0.884815
## zero_femWomen 0.63593320 0.84891762 0.7491 0.453986
## zero_marMarried -1.49946849 0.93867060 -1.5974 0.110518
## zero_kid5 0.62842720 0.44278263 1.4193 0.156166
## zero_phd -0.03771474 0.30800817 -0.1224 0.902572
## zero_ment -0.88229322 0.31622813 -2.7901 0.005381 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This can then be easily extended to employ a clustered covariance matrix estimate using the vcovCL()
function. Here, the kid5
variable is employed as you suggested. (Note, if someone else is reading this: The usage of kid5
is just to show things "work" but does not really make sense in this application.)
coeftest(fm1, vcov = vcovCL(fm1, cluster = bioChemists$kid5))
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## count_(Intercept) 0.41674653 0.17009748 2.4500 0.01447 *
## count_femWomen -0.19550683 0.01701325 -11.4914 < 2.2e-16 ***
## count_marMarried 0.09758263 0.02401883 4.0628 5.272e-05 ***
## count_kid5 -0.15173246 0.03612916 -4.1997 2.938e-05 ***
## count_phd -0.00070013 0.04852615 -0.0144 0.98849
## count_ment 0.02478620 0.00263208 9.4170 < 2.2e-16 ***
## zero_(Intercept) -0.19168829 0.51865043 -0.3696 0.71177
## zero_femWomen 0.63593320 0.87775846 0.7245 0.46895
## zero_marMarried -1.49946849 1.03481783 -1.4490 0.14768
## zero_kid5 0.62842720 0.35073624 1.7917 0.07351 .
## zero_phd -0.03771474 0.13873870 -0.2718 0.78581
## zero_ment -0.88229322 0.07481264 -11.7934 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Upvotes: 4