Reputation: 1716
There are many ways to compare regression models in a side-by-side table in R, including the packages stargazer
, huxtable
, and gtsummary
. I'm struggling to do this with two zero-inflated negative binomial models when one has clustered standard errors and one doesn't.
The trouble lies in the different object classes. The clustered error model is in class "coeftest" from the pscl
package while the unclustered model is class "zeroinfl" from the lmtest
package.
Here is a simple example.
library(tidyverse)
library(pscl)
library(sandwich)
library(lmtest)
## data
data("bioChemists", package = "pscl")
set.seed(3.14)
zinb <- read.csv("https://stats.idre.ucla.edu/stat/data/fish.csv") %>%
# add some random data representing lakes
mutate(lake = sample(c("lake1", "lake2", "lake3"), n(), replace = TRUE))
# zero inflated model without clusters
no.clusters <- zeroinfl(count ~ child + camper,
data = zinb, dist = "negbin")
# cluster by lake
with.clusters <- zeroinfl(count ~ child + camper + factor(lake),
data = zinb, dist = "negbin")
v_lake = vcovCL(with.clusters, type = "HC1", cluster = ~lake)
with.clusters.final <- coeftest(with.clusters, v_lake)
How can I generate a table comparing the objects no.cluster
and with.clusters.final
?
Upvotes: 1
Views: 956
Reputation: 17715
The modelsummary
package accepts a vcov
argument which can be a list of strings (e.g., "robust"
or "classical"
) or a one-sided formula with the cluster variable(s). Under the hood, it will then automagically call functions from the sandwich
package to compute the uncertainty estimates. (Disclaimer: I am the package maintainer).
Note that a note is automatically appended to the bottom of the table to indicate the kind of standard error.
library(pscl)
library(modelsummary)
set.seed(3.14)
zinb <- read.csv("https://stats.idre.ucla.edu/stat/data/fish.csv")
zinb$lake <- sample(c("lake1", "lake2", "lake3"), nrow(zinb), replace = TRUE)
mod1 <- zeroinfl(count ~ child + camper,
data = zinb, dist = "negbin")
mod2 <- zeroinfl(count ~ child + camper + factor(lake),
data = zinb, dist = "negbin")
modelsummary(list(mod1, mod2),
vcov = list("classical", ~lake),
stars = TRUE)
Upvotes: 1
Reputation: 2262
Here's a simpler approach: use coeftest
for the no-cluster object, but without clusters. You can then use e.g. huxtable::huxreg()
to print out your coefficients:
with.clusters.final <- coeftest(with.clusters, v_lake, save = TRUE)
no.clusters.coeftest <- coeftest(no.clusters, save = TRUE)
huxreg(no.clusters.coeftest, with.clusters.final)
───────────────────────────────────────────────────────
(1) (2)
─────────────────────────────
count_(Intercept) 1.052 *** 1.087 ***
(0.270) (0.116)
count_child -0.911 ** -0.821 *
(0.285) (0.336)
count_camper 0.798 ** 0.806 ***
(0.305) (0.159)
zero_(Intercept) -11.499 -21.326 ***
(55.699) (5.362)
zero_child 10.483 11.734 ***
(55.659) (2.420)
zero_camper -9.501 -3.071 ***
(55.663) (0.852)
count_factor(lake)lake2 0.182
(0.195)
count_factor(lake)lake3 -0.595 ***
(0.108)
zero_factor(lake)lake2 10.878 ***
(2.609)
zero_factor(lake)lake3 1.353 ***
(0.223)
─────────────────────────────
nobs
───────────────────────────────────────────────────────
*** p < 0.001; ** p < 0.01; * p < 0.05.
Selecting rows, and giving them nice names, can be done with the coefs
argument.
Upvotes: 0
Reputation: 1716
I realize I was thinking about this problem incorrectly. Instead of trying to directly compare no.cluster
with the object generated using coeftest
, I just need to replace the standard errors in with.clusters
with the robust values generated by coeftest
.
This is a little more complicated than it sounds, since I'm using a zero-inflated regression.
Zero-inflated regression consists of two models, both of which are included in the output of coeftest
. I only care about the values beginning with count_
.
> with.clusters.final
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
count_(Intercept) 1.08748 0.11608 9.3686 < 0.00000000000000022 ***
count_child -0.82116 0.33572 -2.4460 0.0151678 *
count_camper 0.80562 0.15933 5.0563 0.000000850945 ***
count_factor(lake)lake2 0.18206 0.19467 0.9352 0.3506175
count_factor(lake)lake3 -0.59536 0.10789 -5.5184 0.000000088810 ***
zero_(Intercept) -21.32602 5.36229 -3.9770 0.000092516485 ***
zero_child 11.73360 2.42002 4.8486 0.000002241063 ***
zero_camper -3.07137 0.85151 -3.6070 0.0003772 ***
zero_factor(lake)lake2 10.87828 2.60947 4.1688 0.000042846049 ***
zero_factor(lake)lake3 1.35318 0.22321 6.0624 0.000000005192 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I need to drop the rows beginning with zero_
, then remove the "count_" prefix from the remaining names.
> clustered.se <- with.clusters.final[,2]
> clustered.se <- clustered.se[str_sub(names(clustered.se), 1, 5) == "count"]
> names(clustered.se) <- str_remove(names(clustered.se), "count_")
> clustered.se
(Intercept) child camper factor(lake)lake2 factor(lake)lake3
0.1160763 0.3357177 0.1593300 0.1946651 0.1078864
We can do the same thing for test statistics and p-values. Here is a function so simplify the process.
clustered_stat <- function(coeftest_object, statistic){
statistic.row <- if(statistic == "se"){
2
} else if(statistic == "t") {
3
} else if(statistic == "p") {
4
}
# pull the appropraite row out of the coeftest object
new.values <- coeftest_object[,statistic.row]
# just keep the values from the count model, not the logit model
new.values <- new.values[str_sub(names(new.values), 1, 5) == "count"]
# remove the "count_" prefix from the names of the values
names(new.values) <- str_remove(names(new.values), "count_")
new.values
}
Now I can supply the original clustered zeroinfl
object to stargazer, replacing the statistics with their robust versions created by coeftest
.
> stargazer::stargazer(no.clusters, with.clusters,
+ se = list(NULL, clustered_stat(with.clusters.final, "se")),
+ t = list(NULL, clustered_stat(with.clusters.final, "t")),
+ p = list(NULL, clustered_stat(with.clusters.final, "p")),
+ type = "text",
+ column.labels = c("no clusters", "with clusters"))
==============================================
Dependent variable:
----------------------------
count
no clusters with clusters
(1) (2)
----------------------------------------------
child -0.911*** -0.821**
(0.285) (0.336)
camper 0.798*** 0.806***
(0.305) (0.159)
factor(lake)lake2 0.182
(0.195)
factor(lake)lake3 -0.595***
(0.108)
Constant 1.052*** 1.087***
(0.270) (0.116)
----------------------------------------------
Observations 250 250
Log Likelihood -434.906 -430.784
==============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Upvotes: 0