Reputation: 65
I have a primary df that I want to merge into. Let's call it 'primary_df'.
RCID TypeID Data
777 D Hello
777 O Hey
778 O Hey
779 D Hello
primary_df contains an 'RCID' column that matches up with 'O_ID' in another dataframe that only has data of TypeID 'O'. Let's call that df 'o_type_df'
O_ID O_Data
777 Foo
778 Bar
o_type_df has less entries than primary_df. There are repeat values of 'RCID' in primary_df since the same RCID can have different TypeIDs associated with it.
How can I merge o_type_df into primary_df for all rows of TypeID 'O'?
End result should be:
RCID TypeID Data O_ID O_Data
777 D Hello
777 O Hey 777 Foo
778 O Hey 778 Bar
779 D Hello
Code:
primary_df = pd.DataFrame(columns=['RCID', 'TypeID', 'Data'], data=[[777, 'D', 'Hello'], [777, 'O', 'Hey'], [778, 'O', 'Hey'], [779, 'D', 'Hello']])
o_type_df = pd.DataFrame(columns=['O_ID', 'O_Data'], data=[[777, 'Foo'], [778, 'Bar']])
Upvotes: 3
Views: 676
Reputation: 35636
Try adding an indicator column to o_type_df
:
o_type_df['TypeID'] = 'O'
Then merge
left on those columns:
merged = (
primary_df.merge(o_type_df,
left_on=['RCID', 'TypeID'],
right_on=['O_ID', 'TypeID'],
how='left')
)
merged
:
RCID TypeID Data O_ID O_Data
0 777 D Hello NaN NaN
1 777 O Hey 777.0 Foo
2 778 O Hey 778.0 Bar
3 779 D Hello NaN NaN
Or with assign
:
merged = (
primary_df.merge(o_type_df.assign(TypeID='O'),
left_on=['RCID', 'TypeID'],
right_on=['O_ID', 'TypeID'],
how='left')
)
merged
:
RCID TypeID Data O_ID O_Data
0 777 D Hello NaN NaN
1 777 O Hey 777.0 Foo
2 778 O Hey 778.0 Bar
3 779 D Hello NaN NaN
Upvotes: 3