Reputation: 661
I have a dataframe:
df = pd.DataFrame({
'Metric': ['Total Assets', 'Total Promo', 'Total Assets', 'Total Int'],
'Product': ['AA', 'AA', 'BB', 'AA'],
'Risk': ['High', 'High','Low', 'High'],
'202101': [ 130, 200, 190, 210],
'202102': [ 130, 200, 190, 210],
'202103': [ 130, 200, 190, 210],})
I would like to groupby Product
and Risk
and sum the entries in Total Assets
and Total Promo
and subtract the result from the entries in Total Int
. I could multiply all rows with Total Int
by -1 and sum the result. But I wanted to know if there was a direct way to do so.
df.groupby(['Product', 'Risk']).sum()
The actual dataset is large and it would introduce complexity to multiply certain rows by -1
The output would look like:
df = pd.DataFrame({
'Product': ['AA', 'BB'],
'Risk': ['High', 'Low'],
'202101': [ 120, 190],
'202102': [ 120, 190],
'202103': [ 120, 190],})
Upvotes: 1
Views: 534
Reputation: 2924
How about this as a solution?
(df.
melt(['Metric', 'Product', 'Risk']).
pivot(index=['Product', 'Risk', 'variable'], columns= 'Metric', values = 'value').
assign(Total = lambda df: df['Total Assets'].fillna(0)+df['Total Promo'].fillna(0) - df['Total Int'].fillna(0)).
drop(columns = ['Total Assets', 'Total Promo', 'Total Int']).
reset_index().
pivot(index=['Product', 'Risk'], columns= 'variable', values = 'Total')
)
Upvotes: 0
Reputation: 16683
In your actual dataset, do you have any groups that only have one row? The following solution will work if all groups have greater than one row, so that diff()
, doesn't return nan
. This is thy the second row of output is not in there, but I imagine your groups have more than one row in your large dataset.
IIUC, create a series s
that differentiates the two groups and take the diff
after a groupby
of the sum
:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Metric': ['Total Assets', 'Total Promo', 'Total Assets', 'Total Int'],
'Product': ['AA', 'AA', 'BB', 'AA'],
'Risk': ['High', 'High','Low', 'High'],
'Col1': [ 130, 200, 190, 210],
'Col2': [ 130, 200, 190, 210],
'Col3': [ 130, 200, 190, 210],})
s = np.where(df['Metric'].isin(['Total Assets', 'Total Promo']), 'B', 'A')
cols = ['Product', 'Risk']
(df.groupby(cols + [s]).sum()
.groupby(cols).diff()
.dropna().reset_index().drop('level_2', axis=1))
Out[1]:
Product Risk Col1 Col2 Col3
0 AA High 120.0 120.0 120.0
Upvotes: 1
Reputation: 120399
You can multiply by -1 your Total Int
rows:
df.loc[df['Metric'] == 'Total Int', df.select_dtypes('number').columns] *= -1
# OR
df.loc[df['Metric'] == 'Total Int', df.filter(regex=r'\d{6}').columns] *= -1
>>> df.groupby(['Product', 'Risk']).sum()
202101 202102 202103
Product Risk
AA High 120 120 120
BB Low 190 190 190
Upvotes: 3