user27435905
user27435905

Reputation: 11

different R^2 depending on 2SLS and ivreg function (AER)

I want to run an IV regression in R using ivreg from the AER package. The output gives me a negative R^2 which should be impossible as far as I know. When running the same regression manually with 2SLS the R^2 is positive although very small.

This is caused by the fact that the AER package uses the true X and not the predicted/fitted values from the first stage to calculate the residuals. It happens when the fit is pretty bad but the R^2 always differes between using ivreg and 2SLS manually. My question is whether the calculation of R^2 is wrong in the AER package or if R^2 may be negative under these circumstances. Here is some code to reproduce the negative R^2:

library(AER)
set.seed(40)
n <- 100

# Data generation
Z <- rnorm(n, 10, 2)
X <- 2 * Z + rnorm(n, 0, 10000)
Y <- 3 * X + rnorm(n, 0, 1000000)
df <- data.frame(Z, X, Y)

# IV regression
ivreg1 <- ivreg(Y ~ X | Z, data = df)
summary(ivreg1)

# 2SLS approach
lm1 <- lm(X ~ Z, data = df)
df$predict <- predict(lm1)
lm2 <- lm(Y ~ predict, data = df)
summary(lm2)

The output of the ivreg function:

Call:
ivreg(formula = Y ~ X | Z, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-3513062  -843258   -33611   845922  4533273 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  43430.1   165298.8   0.263    0.793
X             -114.1      131.9  -0.865    0.389

Residual standard error: 1553000 on 98 degrees of freedom
Multiple R-Squared: -1.665, Adjusted R-squared: -1.692 
Wald test: 0.7479 on 1 and 98 DF,  p-value: 0.3893 

Upvotes: 1

Views: 55

Answers (1)

one
one

Reputation: 3902

R^2 compares the MSS of the full model with the constant-only model. It could results in negative R^2 when the two models are not nested, which is what happens with IV regression. The main issue is standard deviations are too large when generating X and Y (sd=10000 and 100000, respectively). Lowering sd would give you desired output.

# Data generation
Z <- rnorm(n, 10, 2)
X <- 2 * Z + rnorm(n, 0, 1)
Y <- 3 * X + rnorm(n, 0, 1)
df <- data.frame(Z, X, Y)

# IV regression
ivreg1 <- ivreg(Y ~ X | Z, data = df)
summary(ivreg1)

Call:
ivreg(formula = Y ~ X | Z, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.42247 -0.60029 -0.09537  0.83643  2.01900 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.6215     0.4983  -1.247    0.215    
X             3.0354     0.0242 125.449   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9427 on 98 degrees of freedom
Multiple R-Squared: 0.9942, Adjusted R-squared: 0.9942 
Wald test: 1.574e+04 on 1 and 98 DF,  p-value: < 2.2e-16 

Upvotes: 0

Related Questions