Albert Lutakome
Albert Lutakome

Reputation: 41

Merging two datasets with variables in one corresponding observations in the other

I ran into a challenge today and I hope to get some help. I want to merge 2 data sets. Dataset1 contains the member roster with 3 variables, the batch_id, member_num in the batch and occupation. Dataset2 consists of member license statuses.

The challenge here is that in dataset2 the member_num is represented as a variable in a way that member_num_x in dataset2 corresponds to "x" observation under variable member_num in dataset1. I need to merge these two datasets so that in the end I have one dataset that has batch_id, member_num, occupation, and license status for each member.


Dataset1
| batch_id   | member_num | occupation |
| --------   | --------   |  -------- 
| A01        |   1        | Driver     |
| A01        |   2        | Driver     |
| A01        |   3        | Driver     |
| A01        |   4        | Driver     |
| A02        |   1        | Navigator  |
| A02        |   2        | Navigator  |


Dataset2
| batch_id |member_num_1|member_num_2|member_num_3|member_num_4|
| -------- | --------   | --------   | --------   | --------   |         
| A01      | Yes        |   NA       |   Yes      |   No       | 
| A02      | No    |    |   NA       |


Desired Output 

| batch_id   | member_num | occupation | License_status
| --------   | --------   |  -------- 
| A01        |   1        | Driver     | Yes
| A01        |   2        | Driver     | NA
| A01        |   3        | Driver     | Yes
| A01        |   4        | Driver     | No
| A02        |   1        | Navigator  | No
| A02        |   2        | Navigator  | NA

I have tried using the merge command in Stata but there is no option to do this particular kind of merging. The options that are there use unique variables (almost same as joins on primary keys).

Upvotes: 0

Views: 83

Answers (2)

Albert Lutakome
Albert Lutakome

Reputation: 41

The final answer with the minor edit:

clear 

use "Dataset2.dta" reshape long member_num_, i(batch_id) j(member_num) rename member_num_ License_status 
tempfile d2long 
save d2long',replace  

use "Dataset1.dta",clear  

merge m:1 batch_id member_num using d2long',nogenerate keep(1 3)

Upvotes: 0

langtang
langtang

Reputation: 24845

You need to reshape d2 into long format and then merge/link on batch_id, member_num

Approach 1 (using frames)

clear
use d1
frame create d2
frame d2: use d2
frame d2: reshape long member_num_, i(batch_id) j(member_num)
frlink 1:1 batch_id member_num, frame(d2)
frget License_status = member_num_, from(d2)

Approach 2 (using merge)

clear
use d2
reshape long member_num_, i(batch_id) j(member_num)
rename member_num_ License_status
tempfile d2long
save `d2long',replace
use d1,clear
merge 1:1 batch_id member_num using `d2long',nogenerate keep(1 3)

Output:

       batch_id   member~m   occupat~n   d2   Licens~s  
  1.        A01          1      Driver    1        Yes  
  2.        A01          2      Driver    2         NA  
  3.        A01          3      Driver    3        Yes  
  4.        A01          4      Driver    4         No  
  5.        A02          1   Navigator    5         No  
  6.        A02          2   Navigator    6         NA

Input:

d1.dta:

       batch_id   member~m   occupat~n  
  1.        A01          1      Driver  
  2.        A01          2      Driver  
  3.        A01          3      Driver  
  4.        A01          4      Driver  
  5.        A02          1   Navigator  
  6.        A02          2   Navigator  

d2.dta:

       batch_id   member~1   member~2   member~3   member~4  
  1.        A01        Yes         NA        Yes         No  
  2.        A02         No         NA                       

Upvotes: 2

Related Questions