Tomas R
Tomas R

Reputation: 556

fixest vs lm - diffrent results? (difference in difference)

I'm trying to do a 'classic' difference in difference with multiple time periods. The model I want to do is:

y = a + b1x1 + b2_treat + b3_period + b_4(treat*period) + u (eq.1)

So basically I'm testing different setups just to make sure I specify my model in the right way, using different packages. I want to use the fixest-package, so I tried to compare the estimates with the estimates from the standard lm()-package. The results, however, differ -- both coefficients and std.errors.

My question is:

  1. Is either the lm_mod, lm_mod2 or the feols_mod regressions specified correctly (as in eq.1)?

If not, I would appreciate it if anyone can show me how to get the same results in lm() as in feols()!

# libraries
library(fixest)
library(modelsummary)
library(tidyverse)

# load data
data(base_did)

# make df for lm_mod with 5 as the reference-period
base_ref_5 <- base_did %>% 
  mutate(period = as.factor(period)) %>% 
  mutate(period = relevel(period, ref = 5))

# Notice that i use base_ref_5 for the lm model and base_did for the feol_mod.
lm_mod <- lm(y ~ x1 + treat*period, base_ref_5)
lm_mod2 <- lm(y ~ x1 + treat + period + treat*period, base_ref_5)
feols_mod <- feols(y ~ x1 + i(period, treat, ref = 5), base_did)

# compare models
models <- list("lm" = lm_mod, 
               "lm2" = lm_mod2,
               "feols" = feols_mod)

msummary(models, stars = T)

**EDIT:** 
the reason why I created base_ref_5 was so that both regressions would have period 5 as the reference period, if that was unclear.

**EDIT 2**: 
added a third model (lm_mod2) which is much closer, but there is still a difference.

Upvotes: 1

Views: 1039

Answers (1)

Vincent
Vincent

Reputation: 17868

There are two issues, here.

  1. In the lm() model, the period variable is interacted, but treated as a continuous numeric variable. In contrast, calling i(period, treat) treats period as a factor (this is explained clearly in the documentation).
  2. The i() function only includes the interactions, and not the constitutive terms.

Here are two models to illustrate the parallels:

library(fixest)

data(base_did)

lm_mod <- lm(y ~ x1 + factor(period) * factor(treat), base_did)

feols_mod <- feols(y ~ x1 + factor(period) + i(period, treat), base_did)

coef(lm_mod)["x1"]

#>        x1 
#> 0.9799697

coef(feols_mod)["x1"]
#>        x1 
#> 0.9799697

Please note that I only answered the part of your question about parallels between lm and feols. StackOverflow is a programming Q&A site. If you have questions about the proper specification of a statistical model, you might want to ask on CrossValidated.

Upvotes: 2

Related Questions