Reputation: 11
I want to run an IV regression in R using ivreg from the AER package. The output gives me a negative R^2 which should be impossible as far as I know. When running the same regression manually with 2SLS the R^2 is positive although very small.
This is caused by the fact that the AER package uses the true X and not the predicted/fitted values from the first stage to calculate the residuals. It happens when the fit is pretty bad but the R^2 always differes between using ivreg and 2SLS manually. My question is whether the calculation of R^2 is wrong in the AER package or if R^2 may be negative under these circumstances. Here is some code to reproduce the negative R^2:
library(AER)
set.seed(40)
n <- 100
# Data generation
Z <- rnorm(n, 10, 2)
X <- 2 * Z + rnorm(n, 0, 10000)
Y <- 3 * X + rnorm(n, 0, 1000000)
df <- data.frame(Z, X, Y)
# IV regression
ivreg1 <- ivreg(Y ~ X | Z, data = df)
summary(ivreg1)
# 2SLS approach
lm1 <- lm(X ~ Z, data = df)
df$predict <- predict(lm1)
lm2 <- lm(Y ~ predict, data = df)
summary(lm2)
The output of the ivreg function:
Call:
ivreg(formula = Y ~ X | Z, data = df)
Residuals:
Min 1Q Median 3Q Max
-3513062 -843258 -33611 845922 4533273
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43430.1 165298.8 0.263 0.793
X -114.1 131.9 -0.865 0.389
Residual standard error: 1553000 on 98 degrees of freedom
Multiple R-Squared: -1.665, Adjusted R-squared: -1.692
Wald test: 0.7479 on 1 and 98 DF, p-value: 0.3893
Upvotes: 1
Views: 55
Reputation: 3902
R^2 compares the MSS of the full model with the constant-only model. It could results in negative R^2 when the two models are not nested, which is what happens with IV regression. The main issue is standard deviations are too large when generating X and Y (sd=10000 and 100000, respectively). Lowering sd would give you desired output.
# Data generation
Z <- rnorm(n, 10, 2)
X <- 2 * Z + rnorm(n, 0, 1)
Y <- 3 * X + rnorm(n, 0, 1)
df <- data.frame(Z, X, Y)
# IV regression
ivreg1 <- ivreg(Y ~ X | Z, data = df)
summary(ivreg1)
Call:
ivreg(formula = Y ~ X | Z, data = df)
Residuals:
Min 1Q Median 3Q Max
-2.42247 -0.60029 -0.09537 0.83643 2.01900
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.6215 0.4983 -1.247 0.215
X 3.0354 0.0242 125.449 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9427 on 98 degrees of freedom
Multiple R-Squared: 0.9942, Adjusted R-squared: 0.9942
Wald test: 1.574e+04 on 1 and 98 DF, p-value: < 2.2e-16
Upvotes: 0