Unable to call a function inside groupby.agg

Question

New to python. So please excuse mistakes. I am writing a script to group a pandas dataframe using groupby.agg. I get errors while trying to call a function that takes as input, the output of a lambda function

Here is the sample of the merged dataframe

cprdf.iloc[5:10,5:20]

Out[237]: 

   Loan Nbr  Servicer Loan Nbr Recon  Action Code  Loan Count_x  \

5  21522594           25701889     Y         0.00             1   
6  21522594           25701889     Y         0.00             1   
7  21522594           25701889     Y         0.00             1   
8  21522594           25701889     Y         0.00             1   
9  21522594           25701889     Y         0.00             1   

   Days Delinquent_x Sale Date_x  UPB Beginning  UPB Purchase  UPB Sch Prin  \

5               0.00         NaN     142,936.57          0.00        162.16   
6               0.00         NaN     143,097.92          0.00        161.35   
7               0.00         NaN     143,258.47          0.00        160.55   
8               0.00         NaN     143,418.22          0.00        159.75   
9               0.00         NaN     143,735.33          0.00        317.11   

   UPB Curtailment  UPB Liq  UPB Adjustment  UPB Non Cash  UPB Ending  
5             0.00     0.00            0.00          0.00  142,774.41  
6             0.00     0.00            0.00          0.00  142,936.57  
7             0.00     0.00            0.00          0.00  143,097.92  
8             0.00     0.00            0.00          0.00  143,258.47  
9             0.00     0.00            0.00          0.00  143,418.22

What I am trying to do is implement the following formulas for a variety of groupby operation

SMM = (UPB Curtail+UPB Liq+UPBAdj)/(UPB Begin)

CPR in % = 100*(1-(1-SMM)^12

Here is the relevant code


cprdf['NonSchP'] = cprdf['UPB Curtailment'] + cprdf['UPB Liq'] + \
                    cprdf['UPB Adjustment']


cprdf['SMM'] = np.where(cprdf['UPB Beginning'] == 0, 0,
                        cprdf['NonSchP']/cprdf['UPB Beginning'])



def wtavg(x):  
    return lambda x: np.average(x, weights=cprdf.loc[x.index, 'UPB Beginning'])


def cpr(y):
       z = 100 * (1 - np.power((1 - y), 12))
       return z

# dictionary for new columns

n = {'UPB_sum' : pd.NamedAgg('UPB Beginning', 'sum'),
     'UPB_count': pd.NamedAgg('UPB Beginning', 'count'),
     'PIF_sum': pd.NamedAgg('UPB Liq', 'sum'),
     'PIF_count' : pd.NamedAgg('UPB Liq', np.count_nonzero),
     'SMMAgg' : pd.NamedAgg('SMM', wtavg(cprdf['SMM'])),
     'Rate': pd.NamedAgg('Current Loan Rate',wtavg(cprdf['Current Loan Rate'])),   
     'CPR':pd.NamedAgg('SMM',cpr(wtavg(cprdf['SMM'])))}

cprgroup = cprdf.groupby(['month_year'],as_index=True).agg(**n)

cprgroup.reset_index(drop=False,inplace=True)

I expect the output to be

cprgroup

Out[240]:

  month_year        UPB_sum  UPB_count      PIF_sum  PIF_count  SMM  Rate  \

0    2019-04  11,237,040.94         22   718,172.19       1.00 0.06  5.95   
1    2019-05  16,684,325.75         31         0.00       0.00 0.00  5.99   
2    2019-06 106,783,721.43        221 2,242,731.83       3.00 0.02  5.77   
3    2019-07 104,181,644.18        218 1,035,861.72       3.00 0.01  5.77   
4    2019-08 102,853,211.42        215 3,188,568.04       2.00 0.03  5.77   

    CPR  
0 54.75  
1  0.03  
2 24.07  
3 13.24  
4 31.70

Instead when I run the program I get the following error

runfile('C:/Users/spyder-py3/untitled3.py', wdir='C:/Users/.spyder-py3')
Traceback (most recent call last):

  File "", line 1, in 
    runfile('C:/.spyder-py3/untitled3.py', wdir='C:/Users/.spyder-py3')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/.spyder-py3/untitled3.py", line 51, in 
    'CPR':pd.NamedAgg('SMM',cpr(wtavg(cprdf['SMM'])))}

  File "C:/Users/.spyder-py3/untitled3.py", line 39, in cpr
    z = 100 * (1 - np.power((1 - y), 12))

TypeError: unsupported operand type(s) for -: 'int' and 'function'

Is my mistake calling a lambda fucntion as the input for the cpr function?

When I change dictionary 'n' to use 'SMMAgg' as the input to the function

'CPR':pd.NamedAgg('SMMAgg',cpr(SMMAgg))

I get

NameError: name 'SMMAgg' is not defined

When I change the formula to

'CPR':pd.NamedAgg('SMMAgg',cpr('SMMAgg'))

I get

File "C:/Users/.spyder-py3/untitled3.py", line 39, in cpr
z = 100 * (1 - np.power((1 - y), 12))

TypeError: unsupported operand type(s) for -: 'int' and 'str'

Any help would be appreaciated.

I circumvented the errors by adding the CPR function after aggregation as a new column to the grouped dataframe and was able to get the output I need. But there is something that I don't understand with calling this function inside the dictionary.

Thank you.

Unable to call a function inside groupby.agg

Answers (1)

Related Questions