How to merge two Pandas DataFrames of different size based on condition

Question

I have a primary df that I want to merge into. Let's call it 'primary_df'.

RCID    TypeID    Data
 777         D    Hello
 777         O    Hey
 778         O    Hey
 779         D    Hello

primary_df contains an 'RCID' column that matches up with 'O_ID' in another dataframe that only has data of TypeID 'O'. Let's call that df 'o_type_df'

O_ID   O_Data
 777   Foo
 778   Bar

o_type_df has less entries than primary_df. There are repeat values of 'RCID' in primary_df since the same RCID can have different TypeIDs associated with it.

How can I merge o_type_df into primary_df for all rows of TypeID 'O'?

End result should be:

RCID    TypeID    Data     O_ID   O_Data
 777         D    Hello    
 777         O    Hey      777    Foo
 778         O    Hey      778    Bar
 779         D    Hello

Code:

primary_df = pd.DataFrame(columns=['RCID', 'TypeID', 'Data'], data=[[777, 'D', 'Hello'], [777, 'O', 'Hey'], [778, 'O', 'Hey'], [779, 'D', 'Hello']])
o_type_df = pd.DataFrame(columns=['O_ID', 'O_Data'], data=[[777, 'Foo'], [778, 'Bar']])

Henry Ecker · Accepted Answer

Try adding an indicator column to o_type_df:

o_type_df['TypeID'] = 'O'

Then merge left on those columns:

merged = (
    primary_df.merge(o_type_df,
                     left_on=['RCID', 'TypeID'],
                     right_on=['O_ID', 'TypeID'],
                     how='left')
)

merged:

   RCID TypeID   Data   O_ID O_Data
0   777      D  Hello    NaN    NaN
1   777      O    Hey  777.0    Foo
2   778      O    Hey  778.0    Bar
3   779      D  Hello    NaN    NaN

Or with assign:

merged = (
    primary_df.merge(o_type_df.assign(TypeID='O'),
                     left_on=['RCID', 'TypeID'],
                     right_on=['O_ID', 'TypeID'],
                     how='left')
)

merged:

   RCID TypeID   Data   O_ID O_Data
0   777      D  Hello    NaN    NaN
1   777      O    Hey  777.0    Foo
2   778      O    Hey  778.0    Bar
3   779      D  Hello    NaN    NaN

How to merge two Pandas DataFrames of different size based on condition

Answers (1)

Related Questions