Serg
Serg

Reputation: 13

How do I merge two Pandas DataFrames and add the overlapping columns

I am trying to merge multiple DataFrames on same DocID then sum up the weights but when I do merge it creates Weight_x,Weight_y. This would be fine for only two DataFrames but the amount of Dataframes to merge changes based on user input so merging creates Weight_x, Weight_y multiple times. So how can I merge more than 2 DataFrames such that they are merging on DocID and Weight is Summed?

Example:

df1= DocID Weight
     1     4
     2     7
     3     8
df2= DocID Weight
     1     5
     2     9
     8     1
finalDf=
     DocID Weight
     1     9
     2     16

Upvotes: 1

Views: 330

Answers (3)

Derek O
Derek O

Reputation: 19545

You can merge, set the 'DocID' column as the index, then sum the remaining columns together. Then you can reindex and rename the columns in the resulting final_df as needed:

df_final = pd.merge(df1, df2, on=['DocID']).set_index(['DocID']).sum(axis=1)
df_final = pd.DataFrame({"DocID": df_final.index, "Weight":df_final}).reset_index(drop=True)

Output:

>>> df_final
   DocID  Weight
0      1       9
1      2      16

Upvotes: 1

sammywemmy
sammywemmy

Reputation: 28649

df1.set_index('DocID').add(df2.set_index('DocID')).dropna()

        Weight
DocID   
1       9.0
2       16.0

Upvotes: 1

XXavier
XXavier

Reputation: 1226

Can you try this pd.merge(df1, df2, on=['DocID']).set_index(['DocID']).sum(axis=1) You can now give any name to the sum column.

Upvotes: 0

Related Questions