Reputation: 97
I'm using Stata 13 and have to clean a data set in a panel format with different ids for a given period from 2000 to 2003. My data looks like:
id year ln_wage
1 2000 2.30
1 2001 2.31
1 2002 2.31
2 2001 1.89
2 2002 1.89
2 2003 2.10
3 2002 1.60
4 2002 2.46
4 2003 2.47
5 2000 2.10
5 2001 2.10
5 2003 2.12
I would like to keep in my dataset for each year only individuals that appear in t-1 year. In this way, the first year of my sample (2000) will be dropped. I'm looking for output like:
2001:
id year ln_wage
1 2001 2.31
5 2001 2.10
2002:
id year ln_wage
1 2002 2.31
2 2002 1.89
2003:
id year ln_wage
2 2003 2.10
4 2003 2.47
Regards,
Upvotes: 1
Views: 1719
Reputation: 3261
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id int year float ln_wage
1 2000 2.3
1 2001 2.31
1 2002 2.31
2 2001 1.89
2 2002 1.89
2 2003 2.1
3 2002 1.6
4 2002 2.46
4 2003 2.47
5 2000 2.1
5 2001 2.1
5 2003 2.12
end
xtset id year
drop if missing(L.ln_wage)
sort year id
list, noobs sepby(year)
+---------------------+
| id year ln_wage |
|---------------------|
| 1 2001 2.31 |
| 5 2001 2.1 |
|---------------------|
| 1 2002 2.31 |
| 2 2002 1.89 |
|---------------------|
| 2 2003 2.1 |
| 4 2003 2.47 |
+---------------------+
// Alternatively, assuming no duplicate years within id exist
bysort id (year): gen todrop = year[_n-1] != year - 1
drop if todrop
Upvotes: 2