Search and process data from multi-index DataFrames

Question

I have two dataframes df2, with the payment statistics (that have the probability of the client pay a certain debt) and df3 with new clients data.

import pandas as pd

d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
     'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
     'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
     'Client Number':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
     'DebtPaid':[0.8,0.1,0.5,0.30,0,0.2,0.4,1,0.60,1,0.5,0.2,0,0.3,0,0,0.2,0,0.1,0.70,0.5,0.1]}

df = pd.DataFrame(data=d)

df2=df.groupby(['City','Card','Colateral'])['DebtPaid'].\
           value_counts(bins=[-0.001,0,0.25,0.5,0.75,1,1.001,2],normalize=True)

d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
     'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
     'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
     'Client Number':[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],
     'Total Debt':[100,240,200,1000,50,20,345,10,600,40,50,20,100,30,100,600,200,200,150,700,50,120]}

df3 = pd.DataFrame(data=d)

I want to calculate the estimate amout that a client would pay . Ex: If a client is from Lisbon , have Visa and Colateral, it have 0.333333 change of paying 0% of debt, 0.3333% change of paying ]0-25%] of debt and 0,3333% of paying ]0,75-1]% of the debt. So if this client have a debt of 100, the expected value whould be from

[(0,33 * 0 * 100)+(0 * 0 * 100)+(0,33 * 0,75 * 100] to [(0,33 * 0 * 100 + 0,33 * 0,25 * 100+0,33 * 1 * 100).

So this client would pay from 24,75€ to 41,25€.

Then do the math for all the other clients.

Any idea on how to solve this ?

Scott Boston · Accepted Answer

You can do it this way:

#First merge dataframes
df_out = df2.rename('Prob').reset_index().merge(df3, on=['City', 'Card', 'Colateral'])

#Use the right and left attributes of pd.Interval
df_out['lower'] = [x.left for x in df_out['DebtPaid']]
df_out['upper'] = [x.right for x in df_out['DebtPaid']]

#Calculate lower and upper partial prices
df_out['l_partial'] = df_out[['lower', 'Prob', 'Total Debt']].prod(axis=1)
df_out['u_partial'] = df_out[['upper', 'Prob', 'Total Debt']].prod(axis=1)

#Sum partial prices to get lower and upper price grouped on Client Number
df_out.groupby('Client Number')[['l_partial', 'u_partial']]\
      .agg(lower_price=('l_partial', 'sum'),
           upper_price=('u_partial', 'sum')).clip(0,np.inf)

Output:

              lower_price  upper_price
Client Number                          
23               37.500000    62.500000
24                0.000000    15.000000
25               50.000000   100.000000
26              187.000000   375.000000
27                0.000000     3.125000
28                0.000000     5.000000
29              143.750000   230.000000
30                2.493333     4.166667
31              225.000000   375.000000
32               16.666667    26.666667
33               18.750000    31.250000
34                7.500000    12.500000
35                0.000000     6.250000
36                7.500000    15.000000
37               18.700000    37.500000
38                0.000000    37.500000
39                0.000000    50.000000
40               49.866667    83.333333
41               37.400000    62.500000
42              130.900000   262.500000
43               20.833333    33.333333
44               22.440000    45.000000

Search and process data from multi-index DataFrames

Answers (1)

Related Questions