DunkinDont
DunkinDont

Reputation: 93

Recreating Stata's xtreg, fe in R

I have a panel data set that looks like this:

df1 <- data.frame(date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-01", "2020-01-02", "2020-01-03", "2020-01-01", "2020-01-02", "2020-01-03"),
                  ID = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
                  price = c(102, 103, 107, 95, 96, 98, 77, 76, 72),
                  dummy = c(0, 1, 0, 0, 1, 0, 0, 1, 0))

        date ID price dummy
1 2020-01-01  A   102     0
2 2020-01-02  A   103     1
3 2020-01-03  A   107     0
4 2020-01-01  B    95     0
5 2020-01-02  B    96     1
6 2020-01-03  B    98     0
7 2020-01-01  C    77     0
8 2020-01-02  C    76     1
9 2020-01-03  C    72     0

I have turned it into panel data using the following code:

df1 <- pdata.frame(df1, index = c("price", "date")) #changed to panel data

df1 <- tibble::rownames_to_column(df1, "date2") #turned numbered row names into column

df1 <- df1 %>%
  arrange(ID, date) #ordered first by ID, then date

I now want to run a fixed-effect linear regression, essentially mirroring the xtreg, fe function in Stata.

I have tried the following code, but keep receiving error messages:

fixed <- plm(price ~ dummy, 
             data = treated_panel,
             model = "within")

Error in as.character.factor(x) : malformed factor

How can I run a fixed-effect regression on my panel data?

Upvotes: 1

Views: 393

Answers (1)

jay.sf
jay.sf

Reputation: 72813

Unlike Stata, using plm you may define unit and time variables directly in the index= argument, which saves you the tedious definition of a pdata.frame. Notice, that you need effect='twoways' if you want unit and time FE.

library(plm)
fit <- plm(price ~ dummy + X, data=df1, index=c('ID', 'date'), model="within", effect='twoways')

To get robust standard errors, the summary method for plm has a vcov= argument.

summary(fit, vcov=plm::vcovHC(fit))

Note, that I added an X variable to the toy data to make this work.


Data:

df1 <- structure(list(date = c("2020-01-01", "2020-01-02", "2020-01-03", 
"2020-01-01", "2020-01-02", "2020-01-03", "2020-01-01", "2020-01-02", 
"2020-01-03"), ID = c("A", "A", "A", "B", "B", "B", "C", "C", 
"C"), price = c(102, 103, 107, 95, 96, 98, 77, 76, 72), X = c(0.391173265408725, 
0.35144685767591, 0.0459533138200641, 0.626689063152298, 0.523385446285829, 
0.945963381789625, 0.935278508113697, 0.289080709218979, 0.111053846077994
), dummy = c(0, 1, 0, 0, 1, 0, 0, 1, 0)), class = "data.frame", row.names = c(NA, 
-9L))

Upvotes: 1

Related Questions