Reputation: 43
I have two dataframes df2, with the payment statistics (that have the probability of the client pay a certain debt) and df3 with new clients data.
import pandas as pd
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'],
'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
'Client Number':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
'DebtPaid':[0.8,0.1,0.5,0.30,0,0.2,0.4,1,0.60,1,0.5,0.2,0,0.3,0,0,0.2,0,0.1,0.70,0.5,0.1]}
df = pd.DataFrame(data=d)
df2=df.groupby(['City','Card','Colateral'])['DebtPaid'].\
value_counts(bins=[-0.001,0,0.25,0.5,0.75,1,1.001,2],normalize=True)
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'],
'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
'Client Number':[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],
'Total Debt':[100,240,200,1000,50,20,345,10,600,40,50,20,100,30,100,600,200,200,150,700,50,120]}
df3 = pd.DataFrame(data=d)
I want to calculate the estimate amout that a client would pay . Ex: If a client is from Lisbon , have Visa and Colateral, it have 0.333333 change of paying 0% of debt, 0.3333% change of paying ]0-25%] of debt and 0,3333% of paying ]0,75-1]% of the debt. So if this client have a debt of 100, the expected value whould be from
[(0,33 * 0 * 100)+(0 * 0 * 100)+(0,33 * 0,75 * 100] to [(0,33 * 0 * 100 + 0,33 * 0,25 * 100+0,33 * 1 * 100).
So this client would pay from 24,75€ to 41,25€.
Then do the math for all the other clients.
Any idea on how to solve this ?
Upvotes: 3
Views: 43
Reputation: 153460
You can do it this way:
#First merge dataframes
df_out = df2.rename('Prob').reset_index().merge(df3, on=['City', 'Card', 'Colateral'])
#Use the right and left attributes of pd.Interval
df_out['lower'] = [x.left for x in df_out['DebtPaid']]
df_out['upper'] = [x.right for x in df_out['DebtPaid']]
#Calculate lower and upper partial prices
df_out['l_partial'] = df_out[['lower', 'Prob', 'Total Debt']].prod(axis=1)
df_out['u_partial'] = df_out[['upper', 'Prob', 'Total Debt']].prod(axis=1)
#Sum partial prices to get lower and upper price grouped on Client Number
df_out.groupby('Client Number')[['l_partial', 'u_partial']]\
.agg(lower_price=('l_partial', 'sum'),
upper_price=('u_partial', 'sum')).clip(0,np.inf)
Output:
lower_price upper_price
Client Number
23 37.500000 62.500000
24 0.000000 15.000000
25 50.000000 100.000000
26 187.000000 375.000000
27 0.000000 3.125000
28 0.000000 5.000000
29 143.750000 230.000000
30 2.493333 4.166667
31 225.000000 375.000000
32 16.666667 26.666667
33 18.750000 31.250000
34 7.500000 12.500000
35 0.000000 6.250000
36 7.500000 15.000000
37 18.700000 37.500000
38 0.000000 37.500000
39 0.000000 50.000000
40 49.866667 83.333333
41 37.400000 62.500000
42 130.900000 262.500000
43 20.833333 33.333333
44 22.440000 45.000000
Upvotes: 2