Platalea Minor
Platalea Minor

Reputation: 877

Sum two rows if two cells are the same but in different order

Similar to below

Buyer Seller Amount
John  Mary   3
Mary  John   2
David Bosco  2

Where I want to sum John and Mary rows into one

Expected out come

Trade1 Trade2 Amount
John   Mary   5
David  Bosco  2

My dataframe has around 6000 rows. Thank you for your help

Upvotes: 1

Views: 254

Answers (1)

jezrael
jezrael

Reputation: 863301

First sort values by numpy.sort and create boolean mask by DataFrame.duplicated and then aggregate sum:

df[['Buyer','Seller']] = pd.DataFrame(np.sort(df[['Buyer','Seller']], axis=1))

df2 = df.groupby(['Buyer','Seller'], as_index=False)['Amount'].sum()
df2.columns = ['Trade1','Trade2','Amount']
print (df2)
  Trade1 Trade2  Amount
0  Bosco  David       2
1   John   Mary       5

If dont want modify original columns use syntactic sugar - groupby with Series:

df1 = pd.DataFrame(np.sort(df[['Buyer','Seller']], axis=1))
df1.columns = ['Trade1','Trade2']

df2 = df['Amount'].groupby([df1['Trade1'],df1['Trade2']]).sum().reset_index()
print (df2)
  Trade1 Trade2  Amount
0  Bosco  David       2
1   John   Mary       5

Upvotes: 1

Related Questions