Ghose Bishwajit
Ghose Bishwajit

Reputation: 342

Appending datasets by matched variables

I have to append three datasets named A, B and C that contain data for various years (for example, 1990, 1991...2014).

The problem is that not all datasets contain all the survey years and therefore the unmatched years need to be dropped manually before appending.

I would like to know if there is any way to append three (or more) datasets that will keep only the matched variables across the datasets (years in this case).

Upvotes: 0

Views: 90

Answers (1)

user8682794
user8682794

Reputation:

Consider the following toy example:

clear

input year var
1995 0
1996 1
1997 2
1998 3 
1999 4
2000 5
end

save data1, replace

clear

input year var
1995 6
1996 9
1998 7 
1999 8
2000 9
end

save data2, replace

clear

input year var
1995 10
1996 11
1997 12
2000 13
end

save data3, replace

There is no option that will force append to do what you want, but you can do the following:

use data1, clear

append using data2 data3
duplicates tag year, generate(tag)

sort year
list

     +------------------+
     | year   var   tag |
     |------------------|
  1. | 1995     0     2 |
  2. | 1995     6     2 |
  3. | 1995    10     2 |
  4. | 1996     9     2 |
  5. | 1996     1     2 |
     |------------------|
  6. | 1996    11     2 |
  7. | 1997     2     1 |
  8. | 1997    12     1 |
  9. | 1998     7     1 |
 10. | 1998     3     1 |
     |------------------|
 11. | 1999     8     1 |
 12. | 1999     4     1 |
 13. | 2000    13     2 |
 14. | 2000     5     2 |
 15. | 2000     9     2 |
     +------------------+

drop if tag == 1
list

     +------------------+
     | year   var   tag |
     |------------------|
  1. | 1995     0     2 |
  2. | 1995     6     2 |
  3. | 1995    10     2 |
  4. | 1996     9     2 |
  5. | 1996     1     2 |
     |------------------|
  6. | 1996    11     2 |
  7. | 2000    13     2 |
  8. | 2000     5     2 |
  9. | 2000     9     2 |
     +------------------+

You can also further generalize this approach by finding the maximum value of the variable tag and keeping all observations with that value:

summarize tag
keep if tag == `r(max)'

Upvotes: 2

Related Questions