Reputation: 2514
I want to create lags of a variable. In a panel data setting, obviously lags are only considered within each panel.
How come that plm
's lag()
does not respect the panel structure (by default) and is there a way to change that (without manually dplyr it)?
# Load example data
data("EmplUK", package = "plm")
Em <- pdata.frame(EmplUK, index=c('firm', 'year'))
# how I think it should have worked
Em$lwage_incorrect = lag(Em$wage)
# what actually works
Em= Em %>% group_by(firm) %>% mutate(lwage_correct = lag(wage))
Upvotes: 2
Views: 2209
Reputation: 1411
When I run your code, I get panel-specific lags using both of your methods, so you might want to check it again. I have gotten into similar trouble before when I wasn't clear what lag
function I was actually using (there is one in base R
, one in plm
, and one in dplyr
, for example). Running Em$lwage = plm::lag(Em$wage)
removes this ambiguity.
Upvotes: 5