postcolonialist
postcolonialist

Reputation: 661

Groupby sum and difference of rows in a pandas dataframe

I have a dataframe:

df = pd.DataFrame({ 
    'Metric': ['Total Assets', 'Total Promo', 'Total Assets', 'Total Int'],
    'Product': ['AA', 'AA', 'BB', 'AA'],
    'Risk': ['High', 'High','Low', 'High'],
    '202101': [ 130, 200, 190, 210], 
    '202102': [ 130, 200, 190, 210],
    '202103': [ 130, 200, 190, 210],})

I would like to groupby Product and Risk and sum the entries in Total Assets and Total Promo and subtract the result from the entries in Total Int. I could multiply all rows with Total Int by -1 and sum the result. But I wanted to know if there was a direct way to do so.

df.groupby(['Product', 'Risk']).sum()

The actual dataset is large and it would introduce complexity to multiply certain rows by -1

The output would look like:

df = pd.DataFrame({ 
    'Product': ['AA', 'BB'],
    'Risk': ['High', 'Low'],
    '202101': [ 120, 190], 
    '202102': [ 120, 190],
    '202103': [ 120, 190],})

Upvotes: 1

Views: 534

Answers (3)

Quixotic22
Quixotic22

Reputation: 2924

How about this as a solution?

(df.
 melt(['Metric', 'Product', 'Risk']).
 pivot(index=['Product', 'Risk', 'variable'], columns= 'Metric', values = 'value').
 assign(Total = lambda df: df['Total Assets'].fillna(0)+df['Total Promo'].fillna(0) - df['Total Int'].fillna(0)).
 drop(columns = ['Total Assets', 'Total Promo', 'Total Int']).
 reset_index().
 pivot(index=['Product', 'Risk'], columns= 'variable', values = 'Total')
 )

Upvotes: 0

David Erickson
David Erickson

Reputation: 16683

In your actual dataset, do you have any groups that only have one row? The following solution will work if all groups have greater than one row, so that diff(), doesn't return nan. This is thy the second row of output is not in there, but I imagine your groups have more than one row in your large dataset.

IIUC, create a series s that differentiates the two groups and take the diff after a groupby of the sum:

 import pandas as pd
import numpy as np

df = pd.DataFrame({ 
    'Metric': ['Total Assets', 'Total Promo', 'Total Assets', 'Total Int'],
    'Product': ['AA', 'AA', 'BB', 'AA'],
    'Risk': ['High', 'High','Low', 'High'],
    'Col1': [ 130, 200, 190, 210], 
    'Col2': [ 130, 200, 190, 210],
    'Col3': [ 130, 200, 190, 210],})

s = np.where(df['Metric'].isin(['Total Assets', 'Total Promo']), 'B', 'A')

cols = ['Product', 'Risk']
(df.groupby(cols + [s]).sum()
   .groupby(cols).diff()
   .dropna().reset_index().drop('level_2', axis=1))

Out[1]: 
  Product  Risk   Col1   Col2   Col3
0      AA  High  120.0  120.0  120.0

Upvotes: 1

Corralien
Corralien

Reputation: 120399

You can multiply by -1 your Total Int rows:

df.loc[df['Metric'] == 'Total Int', df.select_dtypes('number').columns] *= -1

# OR

df.loc[df['Metric'] == 'Total Int', df.filter(regex=r'\d{6}').columns] *= -1
>>> df.groupby(['Product', 'Risk']).sum()
              202101  202102  202103
Product Risk
AA      High     120     120     120
BB      Low      190     190     190

Upvotes: 3

Related Questions