Ignacio
Ignacio

Reputation: 7938

Clustered standard errors with texreg?

I'm trying to reproduce this stata example and move from stargazer to texreg. The data is available here.

To run the regression and get the se I run this code:

library(readstata13)
library(sandwich)
cluster_se <- function(model_result, data, cluster){
  model_variables   <- intersect(colnames(data), c(colnames(model_result$model), cluster))
  model_rows <- as.integer(rownames(model_result$model))
  data <- data[model_rows, model_variables]

  cl <- data[[cluster]]
  M <- length(unique(cl))
  N <- nrow(data)
  K <- model_result$rank
  dfc <- (M/(M-1))*((N-1)/(N-K))
  uj  <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
  vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
  sqrt(diag(vcovCL))
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <- lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll, data = elemapi2)
se.lm1 <- cluster_se(model_result = lm1, data = elemapi2, cluster = "dnum")

stargazer::stargazer(lm1, type = "text", style = "aer", se = list(se.lm1))

==========================================================
  api00                 
----------------------------------------------------------
  acs_k3                              6.954                 
                                      (6.901)                

acs_46                             5.966**                
                                   (2.531)                

full                               4.668***               
                                   (0.703)                

enroll                             -0.106**               
                                   (0.043)                

Constant                            -5.200                
                                    (121.786)               

Observations                         395                  
R2                                  0.385                 
Adjusted R2                         0.379                 
Residual Std. Error           112.198 (df = 390)          
F Statistic                61.006*** (df = 4; 390)        
----------------------------------------------------------
  Notes:              ***Significant at the 1 percent level.
**Significant at the 5 percent level. 
*Significant at the 10 percent level. 

texreg produces this:

texreg::screenreg(lm1, override.se=list(se.lm1))    

========================
  Model 1    
------------------------
  (Intercept)    -5.20    
                 (121.79)   
acs_k3          6.95    
                (6.90)   
acs_46          5.97 ***
                (2.53)   
full            4.67 ***
                (0.70)   
enroll         -0.11 ***
                (0.04)   
------------------------
  R^2             0.38    
Adj. R^2        0.38    
Num. obs.     395       
RMSE          112.20    
========================  

How can I fix the p-values?

Upvotes: 0

Views: 1811

Answers (2)

coffeinjunky
coffeinjunky

Reputation: 11514

First, notice that your usage of as.integer is dangerous and likely to cause problems once you use data with non-numeric rownames. For instance, using the built-in dataset mtcars whose rownames consist of car names, your function will coerce all rownames to NA, and your function will not work.

To your actual question, you can provide custom p-values to texreg, which means that you need to compute the corresponding p-values. To achieve this, you could compute the variance-covariance matrix, compute the test-statistics, and then compute the p-value manually, or you just compute the variance-covariance matrix and supply it to e.g. coeftest. Then you can extract the standard errors and p-values from there. Since I am unwilling to download any data, I use the mtcars-data for the following:

library(sandwich)
library(lmtest)
library(texreg)

cluster_se <- function(model_result, data, cluster){
  model_variables   <- intersect(colnames(data), c(colnames(model_result$model), cluster))
  model_rows <- rownames(model_result$model) # changed to be able to work with mtcars, not tested with other data
  data <- data[model_rows, model_variables]
  cl <- data[[cluster]]
  M <- length(unique(cl))
  N <- nrow(data)
  K <- model_result$rank
  dfc <- (M/(M-1))*((N-1)/(N-K))
  uj  <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
  vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
}

lm1 <- lm(formula = mpg ~ cyl + disp, data = mtcars)
vcov.lm1 <- cluster_se(model_result = lm1, data = mtcars, cluster = "carb")

standard.errors <- coeftest(lm1, vcov. = vcov.lm1)[,2]
p.values <- coeftest(lm1, vcov. = vcov.lm1)[,4]

texreg::screenreg(lm1, override.se=standard.errors, override.p = p.values)    

And just for completeness sake, let's do it manually:

t.stats <- abs(coefficients(lm1) / sqrt(diag(vcov.lm1)))
t.stats
(Intercept)         cyl        disp 
  38.681699    5.365107    3.745143 

These are your t-statistics using the cluster-robust standard errors. The degree of freedom is stored in lm1$df.residual, and using the built in functions for the t-distribution (see e.g. ?pt), we get:

manual.p <- 2*pt(-t.stats, df=lm1$df.residual)
manual.p
 (Intercept)          cyl         disp 
1.648628e-26 9.197470e-06 7.954759e-04 

Here, pt is the distribution function, and we want to compute the probability of observing a statistic at least as extreme as the one we observe. Since we testing two-sided and it is a symmetric density, we first take the left extreme using the negative value, and then double it. This is identical to using 2*(1-pt(t.stats, df=lm1$df.residual)). Now, just to check that this yields the same result as before:

all.equal(p.values, manual.p)
[1] TRUE

Upvotes: 4

flexponsive
flexponsive

Reputation: 6168

Robust Standard Errors with texreg are easy: just pass the coeftest directly!

This has become much easier since the question was last answered: it appears you can now just pass the coeftest with the desired variance-covariance matrix directly. Downside: you lose the goodness of fit statistics (such as R^2 and number of observations), but depending on your needs, this may not be a big problem

How to include robust standard errors with texreg

> screenreg(list(reg1, coeftest(reg1,vcov = vcovHC(reg1, 'HC1'))), 
      custom.model.names = c('Standard Standard Errors', 'Robust Standard Errors'))

=============================================================
             Standard Standard Errors  Robust Standard Errors
-------------------------------------------------------------
(Intercept)  -192.89 ***               -192.89 *             
              (55.59)                   (75.38)              
x               2.84 **                   2.84 **            
               (0.96)                    (1.04)              
-------------------------------------------------------------
R^2             0.08                                         
Adj. R^2        0.07                                         
Num. obs.     100                                            
RMSE          275.88                                         
=============================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

To generate this example, I created a dataframe with heteroscedasticity, see below for full runnable sample code:

require(sandwich);
require(texreg);

set.seed(1234)
df <- data.frame(x = 1:100);
df$y <- 1 + 0.5*df$x + 5*100:1*rnorm(100)

reg1 <- lm(y ~ x, data = df)

Upvotes: 4

Related Questions