Reputation: 11
I am currently trying to get used to the plm package and tried to make a fixed effects with individual effects (just for the sake of doing it, please ignore the misspecification) with the plm() function and then with the lm() function. I found that I can only replicate the results of the plm() regression when I include a dummy variable for EACH individual N in the lm() regression. As far as I know, there should always be N-1 dummy variables included in the regression only. Does anyone know how plm handles the individual fixed effects? The same is true for time fixed effects btw.
Here is my code using example data from Grunwald 1958 (included in the plm package as well), please excuse the rather clumsy dummy variable creation:
################################################################################
## Fixed Effects Estimation with plm() and lm() with individual effects
################################################################################
# Prepare R sheet
library(plm)
library(dplyr)
################################################################################
# Get data
data<-read.csv("http://people.stern.nyu.edu/wgreene/Econometrics/grunfeld.csv")
class(data)
data.tbl<-as.tbl(data)
#I = Investment
#F = Real Value of the Firm
#C = Real Value of the Firm's Capital Stock
################################################################################
# create firm (individual) dummies
firmdum<-rbind(matrix(rep(c(1,0,0,0,0,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,1,0,0,0,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,1,0,0,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,1,0,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,1,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,1,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,0,1,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,0,0,1,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,0,0,0,1,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,0,0,0,0,1),20),ncol = 10,byrow = T)
)
colnames(firmdum)<-paste("firm",c(1:10),sep = "")
firmdum.tbl<-tbl_df(firmdum)
firmdum.tbl<-sapply(firmdum.tbl, as.integer)
###############################################################################################
# Estimation with individual fixed effects (plm)
dataset<-tbl_df(cbind(data.tbl,firmdum.tbl))
est1<- plm(I ~ F + C, data = dataset, model = "within", effect = "individual")
summary(est1)
plot(residuals(est1))
# Replication with lm
individualeffects<-tbl_df(cbind(data.tbl,firmdum.tbl))
est2<-lm(I ~ . -1 -FIRM -YEAR, individualeffects)
summary(est2)
plot(residuals(est2))
# Now exclude 1 dummy (as should be done in fixed effects)
individualeffects<-tbl_df(cbind(data.tbl,firmdum.tbl))
est3<-lm(I ~ . -1 -FIRM -YEAR -firm1, individualeffects)
summary(est3)
plot(residuals(est3))
The difference is marginal, but it would be interesting to know how the plm function handles fixed effects. I ran into a problem when it came to running tests on a model, which did not arise when I did the fixed effects estimation with the lm() package excluding one year and one individual dummy. I'd appreciate any help or recommendations!
Upvotes: 1
Views: 1439
Reputation: 3677
For your 3rd estimation (est3
), excluding one dummy and excluding the intercept will give you different results. The practice of excluding one dummy (taking n-1 dummies) makes sense when there is an intercept in the model as variables become linear dependent (if you add up all dummy columns you get a column of all 1's, i.e. the intercept). If there is no intercept, you want all your dummies in your model:
est4 <- lm(I ~ . -1 -FIRM -YEAR, individualeffects)
summary(est4)
This (est4
) gives the sames estimates as the plm()
approach.
By the way: It is easier to let the dummies be created for you by using a factor:
est5 <- lm(I ~ F + C + factor(FIRM), data = individualeffects)
summary(est5)
[...]
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -70.29672 49.70796 -1.414 0.159
F 0.11012 0.01186 9.288 < 2e-16 ***
C 0.31007 0.01735 17.867 < 2e-16 ***
factor(FIRM)2 172.20253 31.16126 5.526 1.08e-07 ***
factor(FIRM)3 -165.27512 31.77556 -5.201 5.14e-07 ***
factor(FIRM)4 42.48742 43.90988 0.968 0.334
factor(FIRM)5 -44.32010 50.49226 -0.878 0.381
factor(FIRM)6 47.13542 46.81068 1.007 0.315
factor(FIRM)7 3.74324 50.56493 0.074 0.941
factor(FIRM)8 12.75106 44.05263 0.289 0.773
factor(FIRM)9 -16.92555 48.45327 -0.349 0.727
factor(FIRM)10 63.72887 50.33023 1.266 0.207
[...]
Notice: there is no factor(FIRM)1
.
So much for the repliction you asked for. You also asked how this is handled in the plm
package: not by introducing dummy variables but by de-meaning of the data per individual as this is equivalent (afaik the theory is the Frisch–Waugh–Lovell theorem).
Upvotes: 1