Reputation: 789
I have a dataframe df1
:-
id | Group | Family | Bonus |
---|---|---|---|
1 | tri23_1 | Laavin | A |
2 | hsgç_T2 | Grendy | B |
3 | bbbj-1Y_jn | Fantol | C |
4 | hsgç_T2 | Gondow | D |
and have a dataframe df2
:-
id | Group | Family | Bonus |
---|---|---|---|
1 | tri | Brendy | B |
2 | hsgç_T2 | Fantol | A |
5 | bbbj-1Y_jn | Fantol | E |
6 | hsgç_T3 | Gondow | D |
For id
in df1
that are present in df2
,I want to set Group
,Family
,Bonus
columns alone of df2
in df1
.
Expected df1
:-
id | Group | Family | Bonus |
---|---|---|---|
1 | tri | Brendy | B |
2 | hsgç_T2 | Fantol | A |
3 | bbbj-1Y_jn | Fantol | C |
4 | hsgç_T2 | Gondow | D |
Upvotes: 0
Views: 68
Reputation: 3883
shared_columns = ['Group', 'Family', 'Bonus']
One-liner:
df1.update(df1.set_index('id')[shared_columns]
.align(df2.set_index('id'), join='left')[1]
.set_axis(df1.index, axis=0))
My original solution:
replacements = (pd.merge(df1, df2, on='id', how='left')
.iloc[:, -len(shared_columns):])
replacements.columns = shared_columns
df1.update(replacements)
Example:
df1 = pd.DataFrame([[1, 1, 0.1, 1],
[2, 2, 0.2, 2],
[3, 3, 0.3, 3]],
columns=['id', 'Group', 'Family', 'Bonus'])
df2 = pd.DataFrame([[3, 30, 300, 3],
[1, 10, 100, 1],
[4, 40, 400, 4]],
columns=['id', 'Group', 'Family', 'Bonus'])
df1
:
id Group Family Bonus
0 1 1 0.1 1
1 2 2 0.2 2
2 3 3 0.3 3
df2
:
id Group Family Bonus
0 3 30 300 3
1 1 10 100 1
2 4 40 400 4
pd.merge(df1, df2, on='id', how='left'
performs a left-merge, on the column id
:
id Group_x Family_x Bonus_x Group_y Family_y Bonus_y
0 1 1 0.1 1 10.0 100.0 1.0
1 2 2 0.2 2 NaN NaN NaN
2 3 3 0.3 3 30.0 300.0 3.0
replacements
(after selecting the last columns, and renaming them):
Group Family Bonus
0 10.0 100.0 1.0
1 NaN NaN NaN
2 30.0 300.0 3.0
Resulting df1
:
id Group Family Bonus
0 1 10.0 100.0 1.0
1 2 2.0 0.2 2.0
2 3 30.0 300.0 3.0
Another way to select the right columns after merging:
By default, .merge uses suffixes=('_x', '_y')
. So to select the columns of the right data frame, we can append '_y'
to each its column name:
merged = pd.merge(df1, df2, on='id', how='left')
right_colnames = [colname + '_y' for colname in shared_columns]
replacements = merged[right_colnames]
Group_y Family_y Bonus_y
0 10.0 100.0 1.0
1 NaN NaN NaN
2 30.0 300.0 3.0
Another solution:
temp_df = df1.reset_index().set_index('id')
temp_df.update(df2.set_index('id')[shared_columns])
df1 = temp_df.reset_index().set_index('index').rename_axis(None, axis=0)
Upvotes: 1
Reputation: 11650
if setting index on DF is acceptable, then this is one way to accomplish it. Using update
df.set_index('id', inplace=True)
df.update(df2.set_index('id'))
df
Group Family Bonus
id
1 tri Brendy B
2 hsgç_T2 Fantol A
3 bbbj-1Y_jn Fantol C
4 hsgç_T2 Gondow D
Upvotes: 0