Reputation: 133
I have two datasets that I have appended together in Stata.
There is one variable, say Age
in both data sets. I sorted the data so that the ages are in ascending order. I want to delete the observations in each dataset where the corresponding ages don't match.
Dataset 1:
Obs Age
1 7
2 8
3 10
4 5
Dataset 2:
Obs Age
1 10
2 5
3 9
4 7
Combined and sorted in ascending order:
Obs Age
1 5
2 5
3 7
4 7
5 8
6 9
7 10
8 10
So because the ages when sorted don't match up for observations 5
and 6
, I want to delete them. Essentially I want a way to loop through pairs of adjacent numbers and compare their values so that I'm only left with pairs with the same ages.
Upvotes: 0
Views: 863
Reputation:
Looping over observations is inefficient and in the vast majority of cases not necessary.
The following works for me:
clear
input age
5
5
7
7
8
9
10
10
end
generate tag = age != age[_n+1] & age != age[_n-1]
list
+-----------+
| age tag |
|-----------|
1. | 5 0 |
2. | 5 0 |
3. | 7 0 |
4. | 7 0 |
5. | 8 1 |
|-----------|
6. | 9 1 |
7. | 10 0 |
8. | 10 0 |
+-----------+
After getting rid of the relevant observations you get the desired result:
keep if tag == 0
list
+-----------+
| age tag |
|-----------|
1. | 5 0 |
2. | 5 0 |
3. | 7 0 |
4. | 7 0 |
5. | 10 0 |
|-----------|
6. | 10 0 |
+-----------+
Upvotes: 1