Reputation: 99
I am a former Excel power user repenting for his sins. I need help recreating a common calculation for me.
I am trying to calculate the performance of a loan portfolio. In the numerator, I am calculating the cumulative total of losses. In the denominator, I need the original balance of the loans included in the cumulative total.
I cannot figure out how to do a conditional groupby in Pandas to accomplish this. It is very simple in Excel, so I am hoping that I am overthinking it.
I could not find much on the issue on StackOverflow, but this was the closest: python pandas conditional cumulative sum
The thing I cannot figure out is that my conditions are based on values in the index and contained in columns
Below is my data:
| Loan | Origination | Balance | NCO Date | NCO | As of Date | Age (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1 | 1/31/2011 | 1000 | 1/31/2018 | 25 | 5/31/2019 | 100 | 84 |
| Loan 2 | 3/31/2011 | 2500 | | 0 | 5/31/2019 | 98 | |
| Loan 3 | 5/31/2011 | 3000 | 1/31/2019 | 15 | 5/31/2019 | 96 | 92 |
| Loan 4 | 7/31/2011 | 2500 | | 0 | 5/31/2019 | 94 | |
| Loan 5 | 9/30/2011 | 1500 | 3/31/2019 | 35 | 5/31/2019 | 92 | 90 |
| Loan 6 | 11/30/2011 | 2500 | | 0 | 5/31/2019 | 90 | |
| Loan 7 | 1/31/2012 | 1000 | 5/31/2019 | 5 | 5/31/2019 | 88 | 88 |
| Loan 8 | 3/31/2012 | 2500 | | 0 | 5/31/2019 | 86 | |
| Loan 9 | 5/31/2012 | 1000 | | 0 | 5/31/2019 | 84 | |
| Loan 10 | 7/31/2012 | 1250 | | 0 | 5/31/2019 | 82 | |
In Excel, I would calculate this total using the following formulas:
Outstanding Balance Line: =SUMIFS(Balance,Age (Months),Reference Age)
Cumulative NCO: =SUMIFS(NCO,Age (Months),>=Reference Age,NCO Age (Months),<=&Reference Age)
Data:
| Reference Age | 85 | 90 | 95 | 100
|---------------------|-------|-------|------|------
| Outstanding Balance | 16500 | 13000 | 6500 | 1000
| Cumulative NCO | 25 | 60 | 40 | 25
The goal here is to include things in Outstanding Balance that are old enough to have an observation for NCO. And NCOs are the total amount that have occurred up until that point for those loans outstanding.
EDIT:
I have gotten a calculation this way. But is this the most efficient?
age_bins = list(np.arange(85, 101, 5))
final_df = pd.DataFrame()
df.fillna(value=0, inplace=True)
df["NCO Age (Months)"] = df["NCO Age (Months)"].astype(int)
for x in age_bins:
age = x
nco = df.loc[(df["Age (Months)"] >= x) & (df["NCO Age (Months)"] <= x), "NCO"].sum()
bal = df.loc[(df["Age (Months)"] >= x), "Balance"].sum()
temp_df = pd.DataFrame(
data=[[age, nco, bal]],
columns=["Age", "Cumulative NCO", "Outstanding Balance"],
index=[age],
)
final_df = final_df.append(temp_df, sort=True)
Upvotes: 9
Views: 812
Reputation: 148880
You use a complex conditions depending on variables. It is easy to find a vectorized way for simple cumulative sums, but I cannot imagine a nice way for the Cumulative NCO.
So I would revert to Python comprehensions:
data = [
{ 'Reference Age': ref,
'Outstanding Balance': df.loc[df.iloc[:,6]>=ref,'Balance'].sum(),
'Cumulative NCO': df.loc[(df.iloc[:,6]>=ref)&(df.iloc[:,7]<=ref),
'NCO'].sum() }
for ref in [85, 90, 95, 100]]
result = pd.DataFrame(data).set_index('Reference Age').T
It produces:
Reference Age 85 90 95 100
Cumulative NCO 25 60 40 25
Outstanding Balance 16500 13000 6500 1000
Upvotes: 2
Reputation: 563
Not sure that I completly follow the exact logic you are going for, but you can accomplish a sumifs with the combination of pandas query
and groupby
.
import pandas as pd
import numpy as np
age = np.random.randint(85, 100, 50)
balance = np.random.randint(1000, 2500, 50)
nco = np.random.randint(85, 100, 50)
df = pd.DataFrame({'age': age, 'balance': balance, 'nco':nco})
df['reference_age'] = df['age'].apply(lambda x: 5 * round(float(x)/5))
outstanding_balance = (
df
.query('age >= reference_age')
.groupby('reference_age')
[['balance']]
.sum()
.rename(columns={'balance': 'Outstanding Balance'}
)
cumulative_nco = (
df
.query('age < reference_age')
.groupby('reference_age')
[['nco']]
.sum()
.rename(columns={'nco': 'cumulative nco'})
.cumsum()
)
result = outstanding_balance.join(cumulative_sum).T
reference_age 85 90 95
Outstanding Balance 2423.0 16350.0 13348.0
cumulative nco NaN 645.0 1107.0
Upvotes: 0
Reputation: 173
You could try to build groups of loans in a given age range using pd.cut and use groupby afterwards. Something like this:
import pandas as pd
df = pd.DataFrame([[1, 2, 3, 4, 5], [7, 8, 9, 10, 11]], index=['age', 'value']).T
df['groups'] = pd.cut(df.age, [0, 1, 3, 5]) # define bins (0,1], (1,3], (3,5]
df.groupby('groups')['value'].sum()
Upvotes: 0