Reputation: 11
I am trying to build a fixed effects regression with the plm package in R. I am using country level panel data with year and country fixed effects.
My problem concerns 2 explanatory variables. One is an interaction term of two varibels and one is a squared term of one of the variables.
model is basically: y = x1 + x1^2+ x3 + x1*x3+ ...+xn , with the variables all being in log form
It is central to the model to include the squared term, but when I run the regression it always gets excluded because of "singularities", as x1 and x1^2 are obviously correlated. Meaning the regression works and I get estimates for my variables, just not for x1^2 and x1*x2. How do I circumvent this?
library(plm)
fe_reg<- plm(log(y) ~ log(x1)+log(x2)+log(x2^2)+log(x1*x2)+dummy,
data = df,
index = c("country", "year"),
model = "within",
effect = "twoways")
summary(fe_reg)
I have tried defining the interaction and squared terms as vectors, which helped with the interaction term but not the squared term.
df1.pd<- df1 %>% mutate_at(c('x1'), ~(scale(.) %>% as.vector))
df1.pd<- df1 %>% mutate_at(c('x2'), ~(scale(.) %>% as.vector))
Upvotes: 1
Views: 414
Reputation: 3687
You just found two properties of the logarithm function:
log(x^2) = 2 * log(x)
log(x*y) = log(x) + log(y)
Then, obviously, log(x) is collinear with 2*log(x) and one of the two collinear variables is dropped from the estimation. Same for log(x*y) and log(x) + log(y).
So, the model you want to estimate is not estimable by linear regression methods. You might want to take different data transformations than log into account or the original variables.
See also the reproducible example below wher I just used log(x^2) = 2*log(x). Linear dependence can be detected, e.g., via function detect.lindep
from package plm
(see also below). Dropping of coefficients from estimation also hints at collinear columns in the model estimation matrix. At times, linear dependence appears only after data transformations invovled in the estimation functions, see for an example of the within transformation the help page ?detect.lindep
in the Example section).
library(plm)
data("Grunfeld")
pGrun <- pdata.frame(Grunfeld)
pGrun$lvalue <- log(pGrun$value) # log(x)
pGrun$lvalue2 <- log(pGrun$value^2) # log(x^2) == 2 * log(x)
mod <- plm(inv ~ lvalue + lvalue2 + capital, data = pGrun, model = "within")
summary(mod)
#> Oneway (individual) effect Within Model
#>
#> Call:
#> plm(formula = inv ~ lvalue + lvalue2 + capital, data = pGrun,
#> model = "within")
#>
#> Balanced Panel: n = 10, T = 20, N = 200
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -186.62916 -20.56311 -0.17669 20.66673 300.87714
#>
#> Coefficients: (1 dropped because of singularities)
#> Estimate Std. Error t-value Pr(>|t|)
#> lvalue 30.979345 17.592730 1.7609 0.07988 .
#> capital 0.360764 0.020078 17.9678 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Total Sum of Squares: 2244400
#> Residual Sum of Squares: 751290
#> R-Squared: 0.66525
#> Adj. R-Squared: 0.64567
#> F-statistic: 186.81 on 2 and 188 DF, p-value: < 2.22e-16
detect.lindep(mod) # run on the model
#> [1] "Suspicious column number(s): 1, 2"
#> [1] "Suspicious column name(s): lvalue, lvalue2"
detect.lindep(pGrun) # run on the data
#> [1] "Suspicious column number(s): 6, 7"
#> [1] "Suspicious column name(s): lvalue, lvalue2"
Upvotes: 1