Reputation: 45
I have two dataframes containing similar columns:
ID prop1
1 UUU &&&
2 III ***
3 OOO )))
4 PPP %%%
and
ID prop2
1 UUU 1234
2 WWW 4567
3 III 7890
5 EEE 0123
6 OOO 3456
7 RRR 6789
8 PPP 9012
I need to merge these two dataframes where the IDs match, and add the prop2 column to the original.
ID prop1 prop1
1 UUU &&& 1234
2 III *** 7890
3 OOO ))) 3456
4 PPP %%% 9012
Ive tried every combination of merge, join, concat, for, iter, etc. It will either fail to merge, lose the index, or straight-up drop the column values.
Upvotes: 2
Views: 1680
Reputation: 917
You can also use .map
to add the prop2 values to your original dataframe, where the ID column values match.
df1['prop2'] = df1['ID'].map(dict(df2[['ID', 'prop2']].to_numpy())
Should there be any IDs in your original dataframe that aren't also in the second one (and so don't have a prop2 value to bring across, you can fill those holes by adding .fillna()
with the value of your choice.
df1['prop2'] = df1['ID'].map(dict(df2[['ID', 'prop2']].to_numpy()).fillna(your_fill_value_here)
Upvotes: 0
Reputation: 23227
You can use pd.merge()
:
pd.merge(df1, df2, on='ID')
Output:
ID prop1 prop2
0 UUU &&& 1234
1 III *** 7890
2 OOO ))) 3456
3 PPP %%% 9012
You can also use df.merge()
as follows::
df1.merge(df2, on='ID')
Same result.
The default parameter on .merge()
no matter using pd.merge()
or df.merge()
is how='inner'
. So you are already doing an inner join without specifying how=
parameter.
If you require the more complicated situation to maintain the index of df1
1, 2, 3, 4
instead of 0, 1, 2, 3
, you can do it by resetting index before merge and then set index on the interim index
column produced when resetting index:
df1.reset_index().merge(df2, on='ID').set_index('index')
Output:
ID prop1 prop2
index
1 UUU &&& 1234
2 III *** 7890
3 OOO ))) 3456
4 PPP %%% 9012
Now, the index 1 2 3 4
of original df1
are kept.
Optionally, if you don't want the axis label index
appear on top of the row index, you can do a rename_axis()
as follows:
df1.reset_index().merge(df2, on='ID').set_index('index').rename_axis(index=None)
Output:
ID prop1 prop2
1 UUU &&& 1234
2 III *** 7890
3 OOO ))) 3456
4 PPP %%% 9012
Upvotes: 4