Reputation: 23
I have a MultiIndex dataframe composed of an Index made up of the address of a store, and status of an order. The values are composed of the number of orders where corresponding to status of the order. So for example, one store might have 6 delivered and 2 cancelled orders.
I want to make a new column that calculates a percentage based on how many orders were cancelled, relative to the number of orders. How would I go about this? For example, if there are no cancelled orders, the percentage would be 0.
Example of the table:
store_address final_status number_of_orders
190 DeliveredStatus 10
191 DeliveredStatus 8
193 CanceledStatus 1
DeliveredStatus 1
194 DeliveredStatus 12
Thanks!
Upvotes: 0
Views: 79
Reputation: 5183
So with this sample df
df = pd.read_fwf(StringIO(
'''store_address final_status number_of_orders
190 DeliveredStatus 10
191 DeliveredStatus 8
193 CanceledStatus 1
193 DeliveredStatus 1
194 DeliveredStatus 12''')).set_index(['store_address', 'final_status'])
You can unstack final_status
so it is the column index instead of part of the row index. Then you can easily get the percentage.
df = df.squeeze().unstack(fill_value=0)
df['cancelled_rate'] = df.CanceledStatus.div(df.sum(axis=1))
Output
final_status CanceledStatus DeliveredStatus cancelled_rate
store_address
190 0 10 0.0
191 0 8 0.0
193 1 1 0.5
194 0 12 0.0
Upvotes: 1
Reputation: 3305
Assuming your starting df
looks like this:
number_of_orders
store_address final_status
190 DeliveredStatus 10
191 DeliveredStatus 8
193 CanceledStatus 1
DeliveredStatus 1
194 DeliveredStatus 12
Do the following:
# unstack final_status and fill_value with 0
df_unstacked = df.unstack(fill_value=0)
print(df_unstacked)
number_of_orders
final_status CanceledStatus DeliveredStatus
store_address
190 0 10
191 0 8
193 1 1
194 0 12
# calculate CanceledProportion
df_unstacked['CanceledProportion'] = df_unstacked.number_of_orders.CanceledStatus / df_unstacked.sum(axis=1)
print(df_unstacked)
number_of_orders CanceledProportion
final_status CanceledStatus DeliveredStatus
store_address
190 0 10 0.0
191 0 8 0.0
193 1 1 0.5
194 0 12 0.0
Upvotes: 0
Reputation: 144
You can try.
import pandas as pd
import numpy as np
ll=[{'data': 'alex','d':3,'c':0},{'data': 'bc','d':2,'c':1},{'data': 'x','d':4,'c':1},{'data': 'ex','d':1,'c':1}]
df=pd.DataFrame(ll)
df['pct']=np.where(df['c']==0, 0, (df['c']/(df['d']+df['c']))*100)
print(df)
Upvotes: 0