user17497170
user17497170

Reputation: 11

including non linearity in fixed effects model in plm

I am trying to build a fixed effects regression with the plm package in R. I am using country level panel data with year and country fixed effects.

My problem concerns 2 explanatory variables. One is an interaction term of two varibels and one is a squared term of one of the variables.

model is basically: y = x1 + x1^2+ x3 + x1*x3+ ...+xn , with the variables all being in log form

It is central to the model to include the squared term, but when I run the regression it always gets excluded because of "singularities", as x1 and x1^2 are obviously correlated. Meaning the regression works and I get estimates for my variables, just not for x1^2 and x1*x2. How do I circumvent this?

library(plm)
fe_reg<- plm(log(y) ~ log(x1)+log(x2)+log(x2^2)+log(x1*x2)+dummy,
                    data = df,
                    index = c("country", "year"), 
                    model = "within",
             effect = "twoways")
summary(fe_reg)  

I have tried defining the interaction and squared terms as vectors, which helped with the interaction term but not the squared term.

df1.pd<- df1 %>% mutate_at(c('x1'), ~(scale(.) %>% as.vector))
df1.pd<- df1 %>% mutate_at(c('x2'), ~(scale(.) %>% as.vector))

Upvotes: 1

Views: 414

Answers (1)

Helix123
Helix123

Reputation: 3687

You just found two properties of the logarithm function:

log(x^2) = 2 * log(x)

log(x*y) = log(x) + log(y)

Then, obviously, log(x) is collinear with 2*log(x) and one of the two collinear variables is dropped from the estimation. Same for log(x*y) and log(x) + log(y).

So, the model you want to estimate is not estimable by linear regression methods. You might want to take different data transformations than log into account or the original variables.

See also the reproducible example below wher I just used log(x^2) = 2*log(x). Linear dependence can be detected, e.g., via function detect.lindep from package plm (see also below). Dropping of coefficients from estimation also hints at collinear columns in the model estimation matrix. At times, linear dependence appears only after data transformations invovled in the estimation functions, see for an example of the within transformation the help page ?detect.lindep in the Example section).

library(plm)
data("Grunfeld")
pGrun <- pdata.frame(Grunfeld)
pGrun$lvalue  <- log(pGrun$value)   # log(x)
pGrun$lvalue2 <- log(pGrun$value^2) # log(x^2) == 2 * log(x)

mod  <- plm(inv ~ lvalue + lvalue2 + capital, data = pGrun, model = "within")
summary(mod)
#> Oneway (individual) effect Within Model
#> 
#> Call:
#> plm(formula = inv ~ lvalue + lvalue2 + capital, data = pGrun, 
#>     model = "within")
#> 
#> Balanced Panel: n = 10, T = 20, N = 200
#> 
#> Residuals:
#>       Min.    1st Qu.     Median    3rd Qu.       Max. 
#> -186.62916  -20.56311   -0.17669   20.66673  300.87714 
#> 
#> Coefficients: (1 dropped because of singularities)
#>          Estimate Std. Error t-value Pr(>|t|)    
#> lvalue  30.979345  17.592730  1.7609  0.07988 .  
#> capital  0.360764   0.020078 17.9678  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    2244400
#> Residual Sum of Squares: 751290
#> R-Squared:      0.66525
#> Adj. R-Squared: 0.64567
#> F-statistic: 186.81 on 2 and 188 DF, p-value: < 2.22e-16

detect.lindep(mod) # run on the model 
#> [1] "Suspicious column number(s): 1, 2"
#> [1] "Suspicious column name(s):   lvalue, lvalue2"

detect.lindep(pGrun) # run on the data
#> [1] "Suspicious column number(s): 6, 7"
#> [1] "Suspicious column name(s):   lvalue, lvalue2"

Upvotes: 1

Related Questions