Clustered standard errors with texreg?

Question

I'm trying to reproduce this stata example and move from stargazer to texreg. The data is available here.

To run the regression and get the se I run this code:

library(readstata13)
library(sandwich)
cluster_se <- function(model_result, data, cluster){
  model_variables   <- intersect(colnames(data), c(colnames(model_result$model), cluster))
  model_rows <- as.integer(rownames(model_result$model))
  data <- data[model_rows, model_variables]

  cl <- data[[cluster]]
  M <- length(unique(cl))
  N <- nrow(data)
  K <- model_result$rank
  dfc <- (M/(M-1))*((N-1)/(N-K))
  uj  <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
  vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
  sqrt(diag(vcovCL))
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <- lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll, data = elemapi2)
se.lm1 <- cluster_se(model_result = lm1, data = elemapi2, cluster = "dnum")

stargazer::stargazer(lm1, type = "text", style = "aer", se = list(se.lm1))

==========================================================
  api00                 
----------------------------------------------------------
  acs_k3                              6.954                 
                                      (6.901)                

acs_46                             5.966**                
                                   (2.531)                

full                               4.668***               
                                   (0.703)                

enroll                             -0.106**               
                                   (0.043)                

Constant                            -5.200                
                                    (121.786)               

Observations                         395                  
R2                                  0.385                 
Adjusted R2                         0.379                 
Residual Std. Error           112.198 (df = 390)          
F Statistic                61.006*** (df = 4; 390)        
----------------------------------------------------------
  Notes:              ***Significant at the 1 percent level.
**Significant at the 5 percent level. 
*Significant at the 10 percent level.

texreg produces this:

texreg::screenreg(lm1, override.se=list(se.lm1))    

========================
  Model 1    
------------------------
  (Intercept)    -5.20    
                 (121.79)   
acs_k3          6.95    
                (6.90)   
acs_46          5.97 ***
                (2.53)   
full            4.67 ***
                (0.70)   
enroll         -0.11 ***
                (0.04)   
------------------------
  R^2             0.38    
Adj. R^2        0.38    
Num. obs.     395       
RMSE          112.20    
========================

How can I fix the p-values?

coffeinjunky · Accepted Answer

First, notice that your usage of as.integer is dangerous and likely to cause problems once you use data with non-numeric rownames. For instance, using the built-in dataset mtcars whose rownames consist of car names, your function will coerce all rownames to NA, and your function will not work.

To your actual question, you can provide custom p-values to texreg, which means that you need to compute the corresponding p-values. To achieve this, you could compute the variance-covariance matrix, compute the test-statistics, and then compute the p-value manually, or you just compute the variance-covariance matrix and supply it to e.g. coeftest. Then you can extract the standard errors and p-values from there. Since I am unwilling to download any data, I use the mtcars-data for the following:

library(sandwich)
library(lmtest)
library(texreg)

cluster_se <- function(model_result, data, cluster){
  model_variables   <- intersect(colnames(data), c(colnames(model_result$model), cluster))
  model_rows <- rownames(model_result$model) # changed to be able to work with mtcars, not tested with other data
  data <- data[model_rows, model_variables]
  cl <- data[[cluster]]
  M <- length(unique(cl))
  N <- nrow(data)
  K <- model_result$rank
  dfc <- (M/(M-1))*((N-1)/(N-K))
  uj  <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
  vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
}

lm1 <- lm(formula = mpg ~ cyl + disp, data = mtcars)
vcov.lm1 <- cluster_se(model_result = lm1, data = mtcars, cluster = "carb")

standard.errors <- coeftest(lm1, vcov. = vcov.lm1)[,2]
p.values <- coeftest(lm1, vcov. = vcov.lm1)[,4]

texreg::screenreg(lm1, override.se=standard.errors, override.p = p.values)

And just for completeness sake, let's do it manually:

t.stats <- abs(coefficients(lm1) / sqrt(diag(vcov.lm1)))
t.stats
(Intercept)         cyl        disp 
  38.681699    5.365107    3.745143

These are your t-statistics using the cluster-robust standard errors. The degree of freedom is stored in lm1$df.residual, and using the built in functions for the t-distribution (see e.g. ?pt), we get:

manual.p <- 2*pt(-t.stats, df=lm1$df.residual)
manual.p
 (Intercept)          cyl         disp 
1.648628e-26 9.197470e-06 7.954759e-04

Here, pt is the distribution function, and we want to compute the probability of observing a statistic at least as extreme as the one we observe. Since we testing two-sided and it is a symmetric density, we first take the left extreme using the negative value, and then double it. This is identical to using 2*(1-pt(t.stats, df=lm1$df.residual)). Now, just to check that this yields the same result as before:

all.equal(p.values, manual.p)
[1] TRUE

Clustered standard errors with texreg?

Answers (2)

Robust Standard Errors with `texreg` are easy: just pass the coeftest directly!

How to include robust standard errors with texreg

Related Questions

Clustered standard errors with texreg?

Answers (2)

Robust Standard Errors with texreg are easy: just pass the coeftest directly!

How to include robust standard errors with texreg

Related Questions

Robust Standard Errors with `texreg` are easy: just pass the coeftest directly!