Alani
Alani

Reputation: 73

update a Dataframe from a different Dataframe based on a matching value

I have two dataframes, DF1(33, 92) and DF2(11, 18) , I want to copy the DF2 18 columns to DF1 based on a matching value from a column name 'ID' in both DFs, these 18 columns have the same names in both dataframes.

I used the following merge: finaldf = pd.merge(DF1, DF2, on = 'ID', how ='left')

This works fine except it changed the 18 columns names in the DF1 and added another 18 columns. so the final dataframe shape was (33, 109) while it supposes to have DF1 shape (33, 92) but with an updated rows.

Upvotes: 3

Views: 697

Answers (2)

sudheer naidu
sudheer naidu

Reputation: 162

if you want the values for those 18 columns (say col1,col2...col18) from DF2 only, you can do

cols_18 = ["col1",col2"....]
cols_to_use = list(set(DF1.columns) - set(cols_18))
pd.merge(DF1[cols_to_use],DF2...), on = 'ID', how ='left')

If you want to keep the columns from both dataframes, the default suffixes are _x and _y. but you can override them like following

pd.merge(DF1,DF2...), on = 'ID', how ='left, suffixes = ["","_new"])

Now there will be 109 columns, but the main dataframe's column names stay intact. The columns from the DF2 have a suffix of "_new"

Upvotes: 0

Ank
Ank

Reputation: 1714

Your finaldf after merge has shape (33, 109) because it has columns with similar names but _x and _y appended to them. _x ones are from DF1 and _y ones are from DF2.

You need to run the below code after merge to remove the extra "_x" and "_y" columns for those 18 and copy the values from DF2 to DF1 where they matched on "ID":

remove_cols = []

for col in DF2.columns:
    if col == 'ID':
        continue
    finaldf[col] = finaldf[col+'_y'].fillna(finaldf[col+'_x'])
    remove_cols += [col+'_x', col+'_y']

finaldf.drop(remove_cols, axis=1, inplace=True)

For more information on why "_x" and "_y" columns appear in your merged dataframe, I would recommend you to check the official documentation of pd.DataFrame.merge method once. "_x" and "_y" are suffixes that merge operation adds by default to distinguish between columns with similar names.


Alternatively:

pd.DataFrame.update is a method in pandas to achieve what you are trying to do.

Check it out here. But there is one caveat with using it, which is that if you have NaN values in DF2 that you would like to copy to DF1, then it won't do that. It will update only non-NA values:

Modify in place using non-NA values from another DataFrame.

Upvotes: 4

Related Questions