Reputation: 315
I have a pandas dataframe with that contains 2 attributes (State & Office ID) and two measures ( Sales and Units) for that office.
I want to calculate the percentage of sales per office in a given state (total of all percentages in each state is 100%). I would like to do the same thing for units. Similar to this problem, but I have multiple measures (link).
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)],
'units': [np.random.randint(100, 999)
for _ in range(12)]})
state_office = df.groupby(['state', 'office_id'])['sales','units'].sum()
I would like to solve it like this (or maybe with a loop if I have many measures):
state_pcts['sales'] = state_office.groupby(level=0).apply(lambda x: 100 * x['sales'] / float(x['sales'].sum()))
state_pcts['units'] = state_pcts.groupby(level=0).apply(lambda x: 100 * x['units'] / float(x['units'].sum()))
But this does not work. It's possible to solve this by setting and resetting the index for every column, but there must be a better way with Lambda. EG
state_office.set_index(["state", "office_id","units"], inplace = True)
state_pcts = state_office.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).reset_index()
state_pcts.set_index(["state", "office_id","sales"], inplace = True)
state_pcts = state_pcts.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).reset_index()
Upvotes: 1
Views: 563
Reputation: 150735
You can use groupby().transform()
to keep the original index:
state_office.div(state_office.groupby(level=0).transform('sum')).mul(100)
Output:
sales units
state office_id
AZ 2 16.981365 31.059160
4 19.250033 23.664122
6 63.768601 45.276718
CA 1 19.331879 22.049287
3 33.858747 24.254215
5 46.809373 53.696498
CO 1 36.851857 29.506546
3 19.874290 35.246727
5 43.273852 35.246727
WA 2 34.707233 34.645669
4 35.511259 16.596002
6 29.781508 48.758328
Upvotes: 1