Reputation: 205
I have the following minimal example:
input str5 name year match1 match2 match3
Alice 2000 . . .
Alice 2000 . . .
Bob 2000 . . .
Carol 2001 0 . .
Alice 2002 0 1 .
Carol 2002 1 0 .
Bob 2003 0 0 1
Bob 2003 0 0 1
end
I have data on name and year, and I want to create binary variables called match'year' that equals 1 if this name is in the data previous 'year'. For example, looking at the first observation in Stata, match1 is a binary variable that equals 1 if Alice appears in year 1999, and match2 is a binary variable that equals 1 if Alice appears in 1998, etc.
If there is no year prior to that year (in this case there is no 1999 or 1998), the binary variable will be missing.
How can I construct these match variables? Note that I have millions of unique names, and using command levelsof name, local(match)
results in macro substitution results in line that is too long
error. Also note that there are sometimes duplicates of names in a given year, and some names may be missing in a given year.
Upvotes: 0
Views: 218
Reputation: 24722
Here is an alternative approach using frames
:
keep name year
frame copy default prev
frame prev: duplicates drop
frame prev: rename year myear
gen myear=.
forvalues i=1/3 {
replace myear = year-`i'
frlink m:1 name myear, frame(prev) generate(match`i')
replace match`i' = 1 if match`i'!=.
}
drop myear
Output:
name year match1 match2 match3
1. Alice 2000 . . .
2. Alice 2000 . . .
3. Bob 2000 . . .
4. Carol 2001 . . .
5. Alice 2002 . 1 .
6. Carol 2002 1 . .
7. Bob 2003 . . 1
8. Bob 2003 . . 1
Upvotes: 1
Reputation: 37183
Thanks for the data example. Here is some technique using rangestat
from SSC. I don't understand your rule on which values should be 0 and which missing.
* Example generated by -dataex-. For more info, type help dataex
clear
input str5 name float year
"Alice" 2000
"Alice" 2000
"Alice" 2002
"Bob" 2000
"Bob" 2003
"Bob" 2003
"Carol" 2001
"Carol" 2002
end
gen one = 1
forval j = 1/3 {
rangestat (max) match`j'=one, int(year -`j' -`j') by(name)
}
drop one
sort name year
list, sepby(year)
+-----------------------------------------+
| name year match1 match2 match3 |
|-----------------------------------------|
1. | Alice 2000 . . . |
2. | Alice 2000 . . . |
|-----------------------------------------|
3. | Alice 2002 . 1 . |
|-----------------------------------------|
4. | Bob 2000 . . . |
|-----------------------------------------|
5. | Bob 2003 . . 1 |
6. | Bob 2003 . . 1 |
|-----------------------------------------|
7. | Carol 2001 . . . |
|-----------------------------------------|
8. | Carol 2002 1 . . |
+-----------------------------------------+
As the original author of levelsof
I find it a little melancholy to see it pressed into service where it is of little or no help.
Upvotes: 3