g612002
g612002

Reputation: 21

Unable to call a function inside groupby.agg

New to python. So please excuse mistakes. I am writing a script to group a pandas dataframe using groupby.agg. I get errors while trying to call a function that takes as input, the output of a lambda function

Here is the sample of the merged dataframe

cprdf.iloc[5:10,5:20]

Out[237]: 

   Loan Nbr  Servicer Loan Nbr Recon  Action Code  Loan Count_x  \

5  21522594           25701889     Y         0.00             1   
6  21522594           25701889     Y         0.00             1   
7  21522594           25701889     Y         0.00             1   
8  21522594           25701889     Y         0.00             1   
9  21522594           25701889     Y         0.00             1   

   Days Delinquent_x Sale Date_x  UPB Beginning  UPB Purchase  UPB Sch Prin  \

5               0.00         NaN     142,936.57          0.00        162.16   
6               0.00         NaN     143,097.92          0.00        161.35   
7               0.00         NaN     143,258.47          0.00        160.55   
8               0.00         NaN     143,418.22          0.00        159.75   
9               0.00         NaN     143,735.33          0.00        317.11   

   UPB Curtailment  UPB Liq  UPB Adjustment  UPB Non Cash  UPB Ending  
5             0.00     0.00            0.00          0.00  142,774.41  
6             0.00     0.00            0.00          0.00  142,936.57  
7             0.00     0.00            0.00          0.00  143,097.92  
8             0.00     0.00            0.00          0.00  143,258.47  
9             0.00     0.00            0.00          0.00  143,418.22  

What I am trying to do is implement the following formulas for a variety of groupby operation

SMM = (UPB Curtail+UPB Liq+UPBAdj)/(UPB Begin)

CPR in % = 100*(1-(1-SMM)^12

Here is the relevant code


cprdf['NonSchP'] = cprdf['UPB Curtailment'] + cprdf['UPB Liq'] + \
                    cprdf['UPB Adjustment']


cprdf['SMM'] = np.where(cprdf['UPB Beginning'] == 0, 0,
                        cprdf['NonSchP']/cprdf['UPB Beginning'])



def wtavg(x):  
    return lambda x: np.average(x, weights=cprdf.loc[x.index, 'UPB Beginning'])


def cpr(y):
       z = 100 * (1 - np.power((1 - y), 12))
       return z

# dictionary for new columns

n = {'UPB_sum' : pd.NamedAgg('UPB Beginning', 'sum'),
     'UPB_count': pd.NamedAgg('UPB Beginning', 'count'),
     'PIF_sum': pd.NamedAgg('UPB Liq', 'sum'),
     'PIF_count' : pd.NamedAgg('UPB Liq', np.count_nonzero),
     'SMMAgg' : pd.NamedAgg('SMM', wtavg(cprdf['SMM'])),
     'Rate': pd.NamedAgg('Current Loan Rate',wtavg(cprdf['Current Loan Rate'])),   
     'CPR':pd.NamedAgg('SMM',cpr(wtavg(cprdf['SMM'])))}

cprgroup = cprdf.groupby(['month_year'],as_index=True).agg(**n)

cprgroup.reset_index(drop=False,inplace=True)   

I expect the output to be

cprgroup

Out[240]:

  month_year        UPB_sum  UPB_count      PIF_sum  PIF_count  SMM  Rate  \

0    2019-04  11,237,040.94         22   718,172.19       1.00 0.06  5.95   
1    2019-05  16,684,325.75         31         0.00       0.00 0.00  5.99   
2    2019-06 106,783,721.43        221 2,242,731.83       3.00 0.02  5.77   
3    2019-07 104,181,644.18        218 1,035,861.72       3.00 0.01  5.77   
4    2019-08 102,853,211.42        215 3,188,568.04       2.00 0.03  5.77   

    CPR  
0 54.75  
1  0.03  
2 24.07  
3 13.24  
4 31.70 

Instead when I run the program I get the following error

runfile('C:/Users/spyder-py3/untitled3.py', wdir='C:/Users/.spyder-py3')
Traceback (most recent call last):

  File "<ipython-input-241-c3f795a9d003>", line 1, in <module>
    runfile('C:/.spyder-py3/untitled3.py', wdir='C:/Users/.spyder-py3')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/.spyder-py3/untitled3.py", line 51, in <module>
    'CPR':pd.NamedAgg('SMM',cpr(wtavg(cprdf['SMM'])))}

  File "C:/Users/.spyder-py3/untitled3.py", line 39, in cpr
    z = 100 * (1 - np.power((1 - y), 12))

TypeError: unsupported operand type(s) for -: 'int' and 'function'

Is my mistake calling a lambda fucntion as the input for the cpr function?

When I change dictionary 'n' to use 'SMMAgg' as the input to the function

'CPR':pd.NamedAgg('SMMAgg',cpr(SMMAgg))

I get

NameError: name 'SMMAgg' is not defined

When I change the formula to

'CPR':pd.NamedAgg('SMMAgg',cpr('SMMAgg'))

I get

File "C:/Users/.spyder-py3/untitled3.py", line 39, in cpr
z = 100 * (1 - np.power((1 - y), 12))

TypeError: unsupported operand type(s) for -: 'int' and 'str'

Any help would be appreaciated.

I circumvented the errors by adding the CPR function after aggregation as a new column to the grouped dataframe and was able to get the output I need. But there is something that I don't understand with calling this function inside the dictionary.

Thank you.

Upvotes: 0

Views: 100

Answers (1)

g612002
g612002

Reputation: 21

After some research, I found a solution. One issue that i noticed( not 100% sure) is that NamegAgg does not accept the same column for multiple custom function for aggregation. So I created a dummy SMM column. I modified the CPR function by returning the lambda instead of assigning it to a new variable and returning it. I also invoked the wtavg function inside the CPR function and called the array of variables as input. So

cprdf['SMM1']=cprdf['SMM']
def wtavg():  
    return lambda x: np.average(x, weights=cprdf.loc[x.index, 'UPB Beginning'])

def cpr():
       return lambda y:  100 * (1 - np.power((1 - wtavg()(y)), 12))

Then my kwarg dictionary looked like this

n = {'UPB_sum' : pd.NamedAgg('UPB Beginning', 'sum'),
     'UPB_count': pd.NamedAgg('UPB Beginning', 'count'),
    'PIF_sum': pd.NamedAgg('UPB Liq', 'sum'),
     'PIF_count' : pd.NamedAgg('UPB Liq', np.count_nonzero),
     'SMMAgg' : pd.NamedAgg('SMM', wtavg()),
     'Rate': pd.NamedAgg('Current Loan Rate',wtavg()),   
     'CPRAgg':pd.NamedAgg('SMM1',cpr())} 
cprgroup=cprdf.groupby(['month_year'],as_index=True).agg(**n)

Output

cprgroup
Out[51]: 
  month_year       UPB_sum  UPB_count     PIF_sum  PIF_count    SMMAgg  \
0    2019-04  1.123704e+07         22   718172.19        1.0  0.063944   
1    2019-05  1.668433e+07         31        0.00        0.0  0.000025   
2    2019-06  1.067837e+08        221  2242731.83        3.0  0.022690   
3    2019-07  1.041816e+08        218  1035861.72        3.0  0.011770   
4    2019-08  1.028532e+08        215  3188568.04        2.0  0.031268   

       Rate     CPRAgg  
0  5.946053  54.749920  
1  5.987882   0.030278  
2  5.774863  24.074820  
3  5.772602  13.244130  
4  5.771342  31.696039  

voila!

Upvotes: 1

Related Questions