Pandas - sum column values for a given row with duplicate column names

Question

I would like somebody help me with the following task I’m trying to resolve. Please see below mock-up of my dataframe.

df= 
Date, Investor, Payment, Investor, Payment,
25/02/2018, Investor1, 100,000, Investor2, 200,000
25/04/2020, Investor1, 110,000, Investor2, 220,000

I would to get sum of all payments for a given date, i.e.

SUM all payments for a given date where column name is df[‘Payment’]. Note that column Name Payment repeats multiple times.

Many thanks for your help.

Mayank Porwal · Accepted Answer

You can do this:

First combine duplicate columns to one column by sum():

In [556]: df                                                                                                                                                                                                
Out[556]: 
         Date    Investor   Payment    Investor  Payment
0  25/02/2018   Investor1    100000   Investor2   200000
1  25/04/2020   Investor1    110000   Investor2   220000

In [563]: df = df.groupby(df.columns, axis=1).sum() 
In [562]: df
Out[562]: 
         Date              Investor  Payment
0  25/02/2018   Investor1 Investor2   300000
1  25/04/2020   Investor1 Investor2   330000

Then, group on Date and find sum of Payment:

In [564]: df.groupby('Date')['Payment'].sum()                                                                                                                                                               
Out[564]: 
Date
25/02/2018    300000
25/04/2020    330000
Name: Payment, dtype: int64

Pandas - sum column values for a given row with duplicate column names

Answers (1)

Related Questions