Reputation: 21
New to python. So please excuse mistakes. I am writing a script to group a pandas dataframe using groupby.agg. I get errors while trying to call a function that takes as input, the output of a lambda function
Here is the sample of the merged dataframe
cprdf.iloc[5:10,5:20]
Out[237]:
Loan Nbr Servicer Loan Nbr Recon Action Code Loan Count_x \
5 21522594 25701889 Y 0.00 1
6 21522594 25701889 Y 0.00 1
7 21522594 25701889 Y 0.00 1
8 21522594 25701889 Y 0.00 1
9 21522594 25701889 Y 0.00 1
Days Delinquent_x Sale Date_x UPB Beginning UPB Purchase UPB Sch Prin \
5 0.00 NaN 142,936.57 0.00 162.16
6 0.00 NaN 143,097.92 0.00 161.35
7 0.00 NaN 143,258.47 0.00 160.55
8 0.00 NaN 143,418.22 0.00 159.75
9 0.00 NaN 143,735.33 0.00 317.11
UPB Curtailment UPB Liq UPB Adjustment UPB Non Cash UPB Ending
5 0.00 0.00 0.00 0.00 142,774.41
6 0.00 0.00 0.00 0.00 142,936.57
7 0.00 0.00 0.00 0.00 143,097.92
8 0.00 0.00 0.00 0.00 143,258.47
9 0.00 0.00 0.00 0.00 143,418.22
What I am trying to do is implement the following formulas for a variety of groupby operation
SMM = (UPB Curtail+UPB Liq+UPBAdj)/(UPB Begin)
CPR in % = 100*(1-(1-SMM)^12
Here is the relevant code
cprdf['NonSchP'] = cprdf['UPB Curtailment'] + cprdf['UPB Liq'] + \
cprdf['UPB Adjustment']
cprdf['SMM'] = np.where(cprdf['UPB Beginning'] == 0, 0,
cprdf['NonSchP']/cprdf['UPB Beginning'])
def wtavg(x):
return lambda x: np.average(x, weights=cprdf.loc[x.index, 'UPB Beginning'])
def cpr(y):
z = 100 * (1 - np.power((1 - y), 12))
return z
# dictionary for new columns
n = {'UPB_sum' : pd.NamedAgg('UPB Beginning', 'sum'),
'UPB_count': pd.NamedAgg('UPB Beginning', 'count'),
'PIF_sum': pd.NamedAgg('UPB Liq', 'sum'),
'PIF_count' : pd.NamedAgg('UPB Liq', np.count_nonzero),
'SMMAgg' : pd.NamedAgg('SMM', wtavg(cprdf['SMM'])),
'Rate': pd.NamedAgg('Current Loan Rate',wtavg(cprdf['Current Loan Rate'])),
'CPR':pd.NamedAgg('SMM',cpr(wtavg(cprdf['SMM'])))}
cprgroup = cprdf.groupby(['month_year'],as_index=True).agg(**n)
cprgroup.reset_index(drop=False,inplace=True)
I expect the output to be
cprgroup
Out[240]:
month_year UPB_sum UPB_count PIF_sum PIF_count SMM Rate \
0 2019-04 11,237,040.94 22 718,172.19 1.00 0.06 5.95
1 2019-05 16,684,325.75 31 0.00 0.00 0.00 5.99
2 2019-06 106,783,721.43 221 2,242,731.83 3.00 0.02 5.77
3 2019-07 104,181,644.18 218 1,035,861.72 3.00 0.01 5.77
4 2019-08 102,853,211.42 215 3,188,568.04 2.00 0.03 5.77
CPR
0 54.75
1 0.03
2 24.07
3 13.24
4 31.70
Instead when I run the program I get the following error
runfile('C:/Users/spyder-py3/untitled3.py', wdir='C:/Users/.spyder-py3')
Traceback (most recent call last):
File "<ipython-input-241-c3f795a9d003>", line 1, in <module>
runfile('C:/.spyder-py3/untitled3.py', wdir='C:/Users/.spyder-py3')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/.spyder-py3/untitled3.py", line 51, in <module>
'CPR':pd.NamedAgg('SMM',cpr(wtavg(cprdf['SMM'])))}
File "C:/Users/.spyder-py3/untitled3.py", line 39, in cpr
z = 100 * (1 - np.power((1 - y), 12))
TypeError: unsupported operand type(s) for -: 'int' and 'function'
Is my mistake calling a lambda fucntion as the input for the cpr function?
When I change dictionary 'n' to use 'SMMAgg' as the input to the function
'CPR':pd.NamedAgg('SMMAgg',cpr(SMMAgg))
I get
NameError: name 'SMMAgg' is not defined
When I change the formula to
'CPR':pd.NamedAgg('SMMAgg',cpr('SMMAgg'))
I get
File "C:/Users/.spyder-py3/untitled3.py", line 39, in cpr
z = 100 * (1 - np.power((1 - y), 12))
TypeError: unsupported operand type(s) for -: 'int' and 'str'
Any help would be appreaciated.
I circumvented the errors by adding the CPR function after aggregation as a new column to the grouped dataframe and was able to get the output I need. But there is something that I don't understand with calling this function inside the dictionary.
Thank you.
Upvotes: 0
Views: 100
Reputation: 21
After some research, I found a solution. One issue that i noticed( not 100% sure) is that NamegAgg does not accept the same column for multiple custom function for aggregation. So I created a dummy SMM column. I modified the CPR function by returning the lambda instead of assigning it to a new variable and returning it. I also invoked the wtavg function inside the CPR function and called the array of variables as input. So
cprdf['SMM1']=cprdf['SMM']
def wtavg():
return lambda x: np.average(x, weights=cprdf.loc[x.index, 'UPB Beginning'])
def cpr():
return lambda y: 100 * (1 - np.power((1 - wtavg()(y)), 12))
Then my kwarg dictionary looked like this
n = {'UPB_sum' : pd.NamedAgg('UPB Beginning', 'sum'),
'UPB_count': pd.NamedAgg('UPB Beginning', 'count'),
'PIF_sum': pd.NamedAgg('UPB Liq', 'sum'),
'PIF_count' : pd.NamedAgg('UPB Liq', np.count_nonzero),
'SMMAgg' : pd.NamedAgg('SMM', wtavg()),
'Rate': pd.NamedAgg('Current Loan Rate',wtavg()),
'CPRAgg':pd.NamedAgg('SMM1',cpr())}
cprgroup=cprdf.groupby(['month_year'],as_index=True).agg(**n)
Output
cprgroup
Out[51]:
month_year UPB_sum UPB_count PIF_sum PIF_count SMMAgg \
0 2019-04 1.123704e+07 22 718172.19 1.0 0.063944
1 2019-05 1.668433e+07 31 0.00 0.0 0.000025
2 2019-06 1.067837e+08 221 2242731.83 3.0 0.022690
3 2019-07 1.041816e+08 218 1035861.72 3.0 0.011770
4 2019-08 1.028532e+08 215 3188568.04 2.0 0.031268
Rate CPRAgg
0 5.946053 54.749920
1 5.987882 0.030278
2 5.774863 24.074820
3 5.772602 13.244130
4 5.771342 31.696039
voila!
Upvotes: 1