Setting value of columns in one dataframe to another dataframe column based on condition

Question

I have a dataframe df1 :-

id	Group	Family	Bonus
1	tri23_1	Laavin	A
2	hsgç_T2	Grendy	B
3	bbbj-1Y_jn	Fantol	C
4	hsgç_T2	Gondow	D

and have a dataframe df2 :-

id	Group	Family	Bonus
1	tri	Brendy	B
2	hsgç_T2	Fantol	A
5	bbbj-1Y_jn	Fantol	E
6	hsgç_T3	Gondow	D

For id in df1 that are present in df2,I want to set Group ,Family,Bonus columns alone of df2 in df1. Expected df1:-

id	Group	Family	Bonus
1	tri	Brendy	B
2	hsgç_T2	Fantol	A
3	bbbj-1Y_jn	Fantol	C
4	hsgç_T2	Gondow	D

Vladimir Fokow · Accepted Answer

shared_columns = ['Group', 'Family', 'Bonus']

One-liner:

df1.update(df1.set_index('id')[shared_columns]
              .align(df2.set_index('id'), join='left')[1]
              .set_axis(df1.index, axis=0))

My original solution:

replacements = (pd.merge(df1, df2, on='id', how='left')
                  .iloc[:, -len(shared_columns):])
replacements.columns = shared_columns

df1.update(replacements)

Example:

df1 = pd.DataFrame([[1,  1, 0.1, 1], 
                    [2,  2, 0.2, 2], 
                    [3,  3, 0.3, 3]], 
                   columns=['id', 'Group', 'Family', 'Bonus'])
df2 = pd.DataFrame([[3,  30, 300, 3], 
                    [1,  10, 100, 1], 
                    [4,  40, 400, 4]], 
                   columns=['id', 'Group', 'Family', 'Bonus'])

df1:

   id  Group  Family  Bonus
0   1      1     0.1      1
1   2      2     0.2      2
2   3      3     0.3      3

df2:

   id  Group  Family  Bonus
0   3     30     300      3
1   1     10     100      1
2   4     40     400      4

pd.merge(df1, df2, on='id', how='left' performs a left-merge, on the column id:

   id  Group_x  Family_x  Bonus_x  Group_y  Family_y  Bonus_y
0   1        1       0.1        1     10.0     100.0      1.0
1   2        2       0.2        2      NaN       NaN      NaN
2   3        3       0.3        3     30.0     300.0      3.0

replacements (after selecting the last columns, and renaming them):

   Group  Family  Bonus
0   10.0   100.0    1.0
1    NaN     NaN    NaN
2   30.0   300.0    3.0

Resulting df1:

   id  Group  Family  Bonus
0   1   10.0   100.0    1.0
1   2    2.0     0.2    2.0
2   3   30.0   300.0    3.0

Another way to select the right columns after merging:

By default, .merge uses suffixes=('_x', '_y'). So to select the columns of the right data frame, we can append '_y' to each its column name:

merged = pd.merge(df1, df2, on='id', how='left')
right_colnames = [colname + '_y' for colname in shared_columns]
replacements = merged[right_colnames]

   Group_y  Family_y  Bonus_y
0     10.0     100.0      1.0
1      NaN       NaN      NaN
2     30.0     300.0      3.0

Another solution:

temp_df = df1.reset_index().set_index('id')
temp_df.update(df2.set_index('id')[shared_columns])
df1 = temp_df.reset_index().set_index('index').rename_axis(None, axis=0)

Setting value of columns in one dataframe to another dataframe column based on condition

Answers (2)

Related Questions