Pandas groupby percentage of total and add subtotals

Question

Similar to this previous post, I would like to derive the percentage within each group but based on the sum of multiple columns and add subtotals. For example given the dataframe below:

import numpy as np
import pandas as pd

np.random.seed(0)

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
               'office_id': list(range(1, 7)) * 2,
               'sales': [np.random.randint(100000, 999999) for _ in range(12)],
               'sales2': [np.random.randint(100000, 999999) for _ in range(12)],
               'sales3': [np.random.randint(100000, 999999) for _ in range(12)]})

The ideal results would yield:

Update

It would be ideal to groupby both state and office id for situations where there are repeating values for office id column. Here is an example:

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
               'office_id': [1,1,1,2,2,2] * 2,
               'sales': [np.random.randint(100000, 999999) for _ in range(12)],
               'sales2': [np.random.randint(100000, 999999) for _ in range(12)],
               'sales3': [np.random.randint(100000, 999999) for _ in range(12)]})

This should then yield:

BENY · Accepted Answer

We need create you need step by step, include groupby with append the subtotal per group on column , then transform the total sum with state

s = df.groupby('state')[['sales','sales2','sales3']].sum().assign(office_id = 'Subtotal').set_index('office_id',append=True)
out = pd.concat([df,s.reset_index()]).sort_values('state')
out['Subtotal'] = out[['sales','sales2','sales3']].sum(axis=1)
v = out.groupby('state')['Subtotal'].transform('sum')/2
out.update(out[['sales','sales2','sales3','Subtotal']].div(v,axis=0))
out
   state office_id     sales    sales2    sales3  Subtotal
3     AZ         4  0.047124  0.175385  0.118068  0.340578
7     AZ         2  0.041571  0.087926  0.087902  0.217399
11    AZ         6  0.156107  0.131998  0.153919  0.442023
0     AZ  Subtotal  0.244802  0.395309  0.359889  1.000000
0     CA         1  0.062026  0.127860  0.145870  0.335756
4     CA         5  0.150188  0.107702  0.068203  0.326092
8     CA         3  0.108636  0.129193  0.100323  0.338152
1     CA  Subtotal  0.320849  0.364755  0.314396  1.000000
2     CO         3  0.058604  0.072756  0.142734  0.274095
6     CO         1  0.108667  0.208210  0.145513  0.462390
10    CO         5  0.127604  0.095630  0.040282  0.263516
2     CO  Subtotal  0.294875  0.376596  0.328529  1.000000
1     WA         2  0.106233  0.081434  0.085797  0.273463
5     WA         6  0.091156  0.127159  0.138270  0.356585
9     WA         4  0.108694  0.195807  0.065451  0.369952
3     WA  Subtotal  0.306083  0.404399  0.289518  1.000000

Pandas groupby percentage of total and add subtotals

Answers (2)

Related Questions