cdubbs
cdubbs

Reputation: 45

Combine two dataframes where column values match

I have two dataframes containing similar columns:

  ID  prop1
1 UUU &&&
2 III ***
3 OOO )))
4 PPP %%%

and

  ID  prop2
1 UUU 1234 
2 WWW 4567
3 III 7890
5 EEE 0123
6 OOO 3456
7 RRR 6789
8 PPP 9012

I need to merge these two dataframes where the IDs match, and add the prop2 column to the original.

  ID  prop1 prop1
1 UUU &&&   1234
2 III ***   7890
3 OOO )))   3456
4 PPP %%%   9012

Ive tried every combination of merge, join, concat, for, iter, etc. It will either fail to merge, lose the index, or straight-up drop the column values.

Upvotes: 2

Views: 1680

Answers (2)

acrobat
acrobat

Reputation: 917

You can also use .map to add the prop2 values to your original dataframe, where the ID column values match.

df1['prop2'] = df1['ID'].map(dict(df2[['ID', 'prop2']].to_numpy())

Should there be any IDs in your original dataframe that aren't also in the second one (and so don't have a prop2 value to bring across, you can fill those holes by adding .fillna() with the value of your choice.

df1['prop2'] = df1['ID'].map(dict(df2[['ID', 'prop2']].to_numpy()).fillna(your_fill_value_here)

Upvotes: 0

SeaBean
SeaBean

Reputation: 23227

You can use pd.merge():

pd.merge(df1, df2, on='ID')

Output:

    ID prop1  prop2
0  UUU   &&&   1234
1  III   ***   7890
2  OOO   )))   3456
3  PPP   %%%   9012

You can also use df.merge() as follows::

df1.merge(df2, on='ID')

Same result.

The default parameter on .merge() no matter using pd.merge() or df.merge() is how='inner'. So you are already doing an inner join without specifying how= parameter.

More complex scenario:

If you require the more complicated situation to maintain the index of df1 1, 2, 3, 4 instead of 0, 1, 2, 3, you can do it by resetting index before merge and then set index on the interim index column produced when resetting index:

df1.reset_index().merge(df2, on='ID').set_index('index')

Output:

        ID prop1  prop2
index                  
1      UUU   &&&   1234
2      III   ***   7890
3      OOO   )))   3456
4      PPP   %%%   9012

Now, the index 1 2 3 4 of original df1 are kept.

Optionally, if you don't want the axis label index appear on top of the row index, you can do a rename_axis() as follows:

df1.reset_index().merge(df2, on='ID').set_index('index').rename_axis(index=None)

Output:

        ID prop1  prop2
1      UUU   &&&   1234
2      III   ***   7890
3      OOO   )))   3456
4      PPP   %%%   9012

Upvotes: 4

Related Questions