Reputation: 7167
I have a panel data from year t1
to t2
. Some individuals enter the sample after t1
and/or exit the sample before t2
. For efficiency (large sample), the dataset only contains rows for years when individuals are observed.
I want to add a new observation per individual, containing the year after an individual left the sample. So, if someone left in, say 2003, I want the new observation to contain the individual's id and the value 2004
in the year variable. Every other variable in that observation should be missing.
This is my approach, using a sample dataset:
webuse nlswork, clear
* Here goes plenty of lines of codes modifying the dataset ... for generality *
timer on 1
preserve
keep id year
bysort id (year) : keep if _n == _N
replace year = year + 1
save temp.dta, replace
restore
append using temp.dta
sort id year
erase temp.dta
timer off 1
timer list
I think this might be a bit inefficient, as it includes a preserve/restore, saving/deleting an additional database, and an append, all relatively time-consuming actions. Something like tsfill, last
would be amazing, but that option doesn't exist. Is anyone aware of a more efficient method? The code above includes timer, so anyone can benchmark it against another method.
Upvotes: 1
Views: 6833
Reputation: 37208
I am never that impressed by attempts to save seconds when coding takes minutes. This is more direct than your approach.
bysort id (year) : gen byte last = _n == _N
expand 2 if last
bysort id (year) : replace year = year + 1 if _n == _N
EDIT: You need to loop over the other variables in your dataset to replace their values with missing. For simplicity, I will assume that they are all numeric.
bysort id (year) : replace last = _n == _N
ds id year, not
quietly foreach v in `r(varlist)' {
replace `v' = . if last
}
Upvotes: 1