Reputation: 41
I ran into a challenge today and I hope to get some help.
I want to merge
2 data sets. Dataset1 contains the member roster with 3 variables, the batch_id, member_num in the batch and occupation. Dataset2 consists of member license statuses.
The challenge here is that in dataset2 the member_num is represented as a variable in a way that member_num_x in dataset2 corresponds to "x" observation under variable member_num in dataset1. I need to merge
these two datasets so that in the end I have one dataset that has batch_id, member_num, occupation, and license status for each member.
Dataset1
| batch_id | member_num | occupation |
| -------- | -------- | --------
| A01 | 1 | Driver |
| A01 | 2 | Driver |
| A01 | 3 | Driver |
| A01 | 4 | Driver |
| A02 | 1 | Navigator |
| A02 | 2 | Navigator |
Dataset2
| batch_id |member_num_1|member_num_2|member_num_3|member_num_4|
| -------- | -------- | -------- | -------- | -------- |
| A01 | Yes | NA | Yes | No |
| A02 | No | | NA |
Desired Output
| batch_id | member_num | occupation | License_status
| -------- | -------- | --------
| A01 | 1 | Driver | Yes
| A01 | 2 | Driver | NA
| A01 | 3 | Driver | Yes
| A01 | 4 | Driver | No
| A02 | 1 | Navigator | No
| A02 | 2 | Navigator | NA
I have tried using the merge
command in Stata but there is no option to do this particular kind of merging. The options that are there use unique variables (almost same as joins on primary keys).
Upvotes: 0
Views: 83
Reputation: 41
The final answer with the minor edit:
clear
use "Dataset2.dta" reshape long member_num_, i(batch_id) j(member_num) rename member_num_ License_status
tempfile d2long
save d2long',replace
use "Dataset1.dta",clear
merge m:1 batch_id member_num using d2long',nogenerate keep(1 3)
Upvotes: 0
Reputation: 24845
You need to reshape d2 into long format and then merge/link on batch_id, member_num
clear
use d1
frame create d2
frame d2: use d2
frame d2: reshape long member_num_, i(batch_id) j(member_num)
frlink 1:1 batch_id member_num, frame(d2)
frget License_status = member_num_, from(d2)
clear
use d2
reshape long member_num_, i(batch_id) j(member_num)
rename member_num_ License_status
tempfile d2long
save `d2long',replace
use d1,clear
merge 1:1 batch_id member_num using `d2long',nogenerate keep(1 3)
batch_id member~m occupat~n d2 Licens~s
1. A01 1 Driver 1 Yes
2. A01 2 Driver 2 NA
3. A01 3 Driver 3 Yes
4. A01 4 Driver 4 No
5. A02 1 Navigator 5 No
6. A02 2 Navigator 6 NA
d1.dta:
batch_id member~m occupat~n
1. A01 1 Driver
2. A01 2 Driver
3. A01 3 Driver
4. A01 4 Driver
5. A02 1 Navigator
6. A02 2 Navigator
d2.dta:
batch_id member~1 member~2 member~3 member~4
1. A01 Yes NA Yes No
2. A02 No NA
Upvotes: 2