update a Dataframe from a different Dataframe based on a matching value

Question

I have two dataframes, DF1(33, 92) and DF2(11, 18) , I want to copy the DF2 18 columns to DF1 based on a matching value from a column name 'ID' in both DFs, these 18 columns have the same names in both dataframes.

I used the following merge: finaldf = pd.merge(DF1, DF2, on = 'ID', how ='left')

This works fine except it changed the 18 columns names in the DF1 and added another 18 columns. so the final dataframe shape was (33, 109) while it supposes to have DF1 shape (33, 92) but with an updated rows.

Ank · Accepted Answer

Your finaldf after merge has shape (33, 109) because it has columns with similar names but _x and _y appended to them. _x ones are from DF1 and _y ones are from DF2.

You need to run the below code after merge to remove the extra "_x" and "_y" columns for those 18 and copy the values from DF2 to DF1 where they matched on "ID":

remove_cols = []

for col in DF2.columns:
    if col == 'ID':
        continue
    finaldf[col] = finaldf[col+'_y'].fillna(finaldf[col+'_x'])
    remove_cols += [col+'_x', col+'_y']

finaldf.drop(remove_cols, axis=1, inplace=True)

For more information on why "_x" and "_y" columns appear in your merged dataframe, I would recommend you to check the official documentation of pd.DataFrame.merge method once. "_x" and "_y" are suffixes that merge operation adds by default to distinguish between columns with similar names.

Alternatively:

pd.DataFrame.update is a method in pandas to achieve what you are trying to do.

Check it out here. But there is one caveat with using it, which is that if you have NaN values in DF2 that you would like to copy to DF1, then it won't do that. It will update only non-NA values:

Modify in place using non-NA values from another DataFrame.

update a Dataframe from a different Dataframe based on a matching value

Answers (2)

Related Questions