Reputation: 57
I have a panel dataset with different IDs in different years until 2018.
Year ID
2015 111
2016 111
2017 111
2018 111
2003 222
2004 222
2005 222
2006 222
2011 333
2012 333
2013 333
2014 333
I would like to create a third dummy variable which takes the value of 1 in the year the observation ends if it is before 2018 (which is the end of my observation period) in order to have at the end the following:
Year ID Dummy
2015 111 0
2016 111 0
2017 111 0
2018 111 0
2003 222 0
2004 222 0
2005 222 0
2006 222 1
2011 333 0
2012 333 0
2013 333 0
2014 333 1
I am doing it in order to prepare my panel data before creating a survival analysis. I thought to put together an if statement conditional to the next row in the ID column to be different from the previous one while the according year being different from 2018 but I can't pull up the code. Can someone help?
Upvotes: 2
Views: 64
Reputation: 887501
An option with tidyverse
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(Dummy = +(Year == max(Year) & Year < 2018))
Upvotes: 2
Reputation: 28695
library(data.table)
setDT(df)
df[, Dummy := as.integer(Year == max(Year) & Year < 2018), by = ID]
df
# Year ID Dummy
# 1: 2015 111 0
# 2: 2016 111 0
# 3: 2017 111 0
# 4: 2018 111 0
# 5: 2003 222 0
# 6: 2004 222 0
# 7: 2005 222 0
# 8: 2006 222 1
# 9: 2011 333 0
# 10: 2012 333 0
# 11: 2013 333 0
# 12: 2014 333 1
Upvotes: 3