Reputation: 271
I have been trying to manually reproduce the result given by the Sargan test in R, sadly to no avail.
When I run ivreg()
and then output the Sargan test statistic:
eitc <- read.dta13('education_earnings_v2.dta')
eitc$ln.wage <- log(eitc$wage)
TSLS <- ivreg(data = eitc, ln.wage ~ educ + exper + south + nonwhite
| nearc4 + nearc2 + exper + south + nonwhite)
summary(TSLS, diagnostics=TRUE)
I get a Sargan statistic of 1.63. However, when I try to perform the test manually:
surp_IV1 <- lm(educ ~ nearc2 + nearc4 + exper + south + nonwhite, data=eitc)
surp_IV_fit <- surp_IV1$fitted.values
surp_IV2 <- lm(ln.wage ~ surp_IV_fit + exper + south + nonwhite, data=eitc)
surp_resid <- resid(surp_IV2)
test_surplus <- lm(surp_resid ~ nearc2 + nearc4 + exper + south + nonwhite,
data = eitc)
summary(test_surplus)
With R-Squared = 0.0008032 on 3,010 observations, I get a test statistic of 2.42.
What is the reason for the difference?
Upvotes: 0
Views: 223
Reputation: 330
I guess some of the steps are not necessary.
The procedure is taken from 15-5b Testing Overidentification Restrictions in (Wooldridge, 2019, 7th Edition).
library(wooldridge) # data(card)
library(dplyr) # rename()
library(ivreg) # ivreg()
data(card)
eitc <- card |>
rename(nonwhite = black)
TSLS <- ivreg(lwage ~ educ + exper + south + nonwhite
| nearc4 + nearc2 + exper + south + nonwhite,
data = eitc)
summary(TSLS, diagnostics = TRUE)
#>
#> Call:
#> ivreg(formula = lwage ~ educ + exper + south + nonwhite | nearc4 +
#> nearc2 + exper + south + nonwhite, data = eitc)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.309361 -0.319674 0.007403 0.334821 1.783133
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.17357 0.70494 3.083 0.00207 **
#> educ 0.24131 0.04089 5.901 4.02e-09 ***
#> exper 0.10453 0.01660 6.296 3.50e-10 ***
#> south -0.08534 0.02647 -3.224 0.00128 **
#> nonwhite -0.01541 0.04644 -0.332 0.74009
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 2 3004 19.682 3.22e-09 ***
#> Wu-Hausman 1 3004 27.580 1.61e-07 ***
#> Sargan 1 NA 1.626 0.202
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.4998 on 3005 degrees of freedom
#> Multiple R-Squared: -0.2668, Adjusted R-squared: -0.2685
#> Wald test: 87.8 on 4 and 3005 DF, p-value: < 2.2e-16
TSLS_resid <- resid(TSLS)
surp_IV1 <- lm(TSLS_resid ~ nearc2 + nearc4 + exper + south + nonwhite,
data = eitc)
nobs(surp_IV1) * summary(surp_IV1)[["r.squared"]] # number of observations * R-squared
#> [1] 1.625811
Created on 2023-12-15 with reprex v2.0.2
Upvotes: 0