Reputation: 342
I have to append three datasets named A
, B
and C
that contain data for various years (for example, 1990, 1991...2014
).
The problem is that not all datasets contain all the survey years and therefore the unmatched years need to be dropped manually before appending.
I would like to know if there is any way to append three (or more) datasets that will keep only the matched variables across the datasets (years in this case).
Upvotes: 0
Views: 90
Reputation:
Consider the following toy example:
clear
input year var
1995 0
1996 1
1997 2
1998 3
1999 4
2000 5
end
save data1, replace
clear
input year var
1995 6
1996 9
1998 7
1999 8
2000 9
end
save data2, replace
clear
input year var
1995 10
1996 11
1997 12
2000 13
end
save data3, replace
There is no option that will force append
to do what you want, but you can do the following:
use data1, clear
append using data2 data3
duplicates tag year, generate(tag)
sort year
list
+------------------+
| year var tag |
|------------------|
1. | 1995 0 2 |
2. | 1995 6 2 |
3. | 1995 10 2 |
4. | 1996 9 2 |
5. | 1996 1 2 |
|------------------|
6. | 1996 11 2 |
7. | 1997 2 1 |
8. | 1997 12 1 |
9. | 1998 7 1 |
10. | 1998 3 1 |
|------------------|
11. | 1999 8 1 |
12. | 1999 4 1 |
13. | 2000 13 2 |
14. | 2000 5 2 |
15. | 2000 9 2 |
+------------------+
drop if tag == 1
list
+------------------+
| year var tag |
|------------------|
1. | 1995 0 2 |
2. | 1995 6 2 |
3. | 1995 10 2 |
4. | 1996 9 2 |
5. | 1996 1 2 |
|------------------|
6. | 1996 11 2 |
7. | 2000 13 2 |
8. | 2000 5 2 |
9. | 2000 9 2 |
+------------------+
You can also further generalize this approach by finding the maximum value of the variable tag
and keeping all observations with that value:
summarize tag
keep if tag == `r(max)'
Upvotes: 2