Leif Erikson
Leif Erikson

Reputation: 23

Add new column to Multi Index DataFrame Based on Condition

I have a MultiIndex dataframe composed of an Index made up of the address of a store, and status of an order. The values are composed of the number of orders where corresponding to status of the order. So for example, one store might have 6 delivered and 2 cancelled orders.

I want to make a new column that calculates a percentage based on how many orders were cancelled, relative to the number of orders. How would I go about this? For example, if there are no cancelled orders, the percentage would be 0.

Example of the table:

store_address   final_status    number_of_orders
190             DeliveredStatus 10
191             DeliveredStatus 8
193             CanceledStatus  1
                DeliveredStatus 1
194             DeliveredStatus 12

Thanks!

Upvotes: 0

Views: 79

Answers (3)

RichieV
RichieV

Reputation: 5183

So with this sample df

df = pd.read_fwf(StringIO(
'''store_address   final_status    number_of_orders
190             DeliveredStatus 10
191             DeliveredStatus 8
193             CanceledStatus  1
193             DeliveredStatus 1
194             DeliveredStatus 12''')).set_index(['store_address', 'final_status'])

You can unstack final_status so it is the column index instead of part of the row index. Then you can easily get the percentage.

df = df.squeeze().unstack(fill_value=0)
df['cancelled_rate'] = df.CanceledStatus.div(df.sum(axis=1))

Output

final_status   CanceledStatus  DeliveredStatus  cancelled_rate
store_address
190                         0               10             0.0
191                         0                8             0.0
193                         1                1             0.5
194                         0               12             0.0

Upvotes: 1

Ian Thompson
Ian Thompson

Reputation: 3305

Assuming your starting df looks like this:

                               number_of_orders
store_address final_status                     
190           DeliveredStatus                10
191           DeliveredStatus                 8
193           CanceledStatus                  1
              DeliveredStatus                 1
194           DeliveredStatus                12

Do the following:

# unstack final_status and fill_value with 0
df_unstacked = df.unstack(fill_value=0)

print(df_unstacked)

              number_of_orders                
final_status    CanceledStatus DeliveredStatus
store_address                                 
190                          0              10
191                          0               8
193                          1               1
194                          0              12

# calculate CanceledProportion
df_unstacked['CanceledProportion'] = df_unstacked.number_of_orders.CanceledStatus / df_unstacked.sum(axis=1)

print(df_unstacked)

              number_of_orders                 CanceledProportion
final_status    CanceledStatus DeliveredStatus                   
store_address                                                    
190                          0              10                0.0
191                          0               8                0.0
193                          1               1                0.5
194                          0              12                0.0

Upvotes: 0

Alex Rajan Samuel
Alex Rajan Samuel

Reputation: 144

You can try.

import pandas as pd
import numpy as np

ll=[{'data': 'alex','d':3,'c':0},{'data': 'bc','d':2,'c':1},{'data': 'x','d':4,'c':1},{'data': 'ex','d':1,'c':1}]

df=pd.DataFrame(ll)
df['pct']=np.where(df['c']==0, 0, (df['c']/(df['d']+df['c']))*100)
print(df)

Upvotes: 0

Related Questions