Reputation: 99

Conditional Cumulative Sums in Pandas

I am a former Excel power user repenting for his sins. I need help recreating a common calculation for me.

I am trying to calculate the performance of a loan portfolio. In the numerator, I am calculating the cumulative total of losses. In the denominator, I need the original balance of the loans included in the cumulative total.

I cannot figure out how to do a conditional groupby in Pandas to accomplish this. It is very simple in Excel, so I am hoping that I am overthinking it.

I could not find much on the issue on StackOverflow, but this was the closest: python pandas conditional cumulative sum

The thing I cannot figure out is that my conditions are based on values in the index and contained in columns

Below is my data:

| Loan    | Origination | Balance | NCO Date  | NCO | As of Date | Age     (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1  | 1/31/2011   | 1000    | 1/31/2018 | 25  | 5/31/2019  | 100              | 84               |
| Loan 2  | 3/31/2011   | 2500    |           | 0   | 5/31/2019  | 98           |                  |
| Loan 3  | 5/31/2011   | 3000    | 1/31/2019 | 15  | 5/31/2019  | 96           | 92               |
| Loan 4  | 7/31/2011   | 2500    |           | 0   | 5/31/2019  | 94           |                  |
| Loan 5  | 9/30/2011   | 1500    | 3/31/2019 | 35  | 5/31/2019  | 92           | 90               |
| Loan 6  | 11/30/2011  | 2500    |           | 0   | 5/31/2019  | 90           |                  |
| Loan 7  | 1/31/2012   | 1000    | 5/31/2019 | 5   | 5/31/2019  | 88           | 88               |
| Loan 8  | 3/31/2012   | 2500    |           | 0   | 5/31/2019  | 86           |                  |
| Loan 9  | 5/31/2012   | 1000    |           | 0   | 5/31/2019  | 84           |                  |
| Loan 10 | 7/31/2012   | 1250    |           | 0   | 5/31/2019  | 82           |                  |

In Excel, I would calculate this total using the following formulas:

Outstanding Balance Line: =SUMIFS(Balance,Age (Months),Reference Age)

Cumulative NCO: =SUMIFS(NCO,Age (Months),>=Reference Age,NCO Age (Months),<=&Reference Age)

Data:

| Reference Age       | 85    | 90    | 95   | 100  
|---------------------|-------|-------|------|------
| Outstanding Balance | 16500 | 13000 | 6500 | 1000 
| Cumulative NCO      | 25    | 60    | 40   | 25

The goal here is to include things in Outstanding Balance that are old enough to have an observation for NCO. And NCOs are the total amount that have occurred up until that point for those loans outstanding.

EDIT:

I have gotten a calculation this way. But is this the most efficient?

age_bins = list(np.arange(85, 101, 5))
final_df = pd.DataFrame()
df.fillna(value=0, inplace=True)
df["NCO Age (Months)"] = df["NCO Age (Months)"].astype(int)

for x in age_bins:

    age = x

    nco = df.loc[(df["Age (Months)"] >= x) & (df["NCO Age (Months)"] <= x), "NCO"].sum()

    bal = df.loc[(df["Age (Months)"] >= x), "Balance"].sum()

    temp_df = pd.DataFrame(
        data=[[age, nco, bal]],
        columns=["Age", "Cumulative NCO", "Outstanding Balance"],
        index=[age],
    )

    final_df = final_df.append(temp_df, sort=True)

Upvotes: 9

Answers (3)

Serge Ballesta

Reputation: 148880

You use a complex conditions depending on variables. It is easy to find a vectorized way for simple cumulative sums, but I cannot imagine a nice way for the Cumulative NCO.

So I would revert to Python comprehensions:

data = [
    { 'Reference Age': ref,
      'Outstanding Balance': df.loc[df.iloc[:,6]>=ref,'Balance'].sum(),
      'Cumulative NCO': df.loc[(df.iloc[:,6]>=ref)&(df.iloc[:,7]<=ref),
                   'NCO'].sum() }
    for ref in [85, 90, 95, 100]]

result = pd.DataFrame(data).set_index('Reference Age').T

It produces:

Reference Age          85     90    95    100
Cumulative NCO          25     60    40    25
Outstanding Balance  16500  13000  6500  1000

Upvotes: 2

Waylon Walker

Reputation: 563

Not sure that I completly follow the exact logic you are going for, but you can accomplish a sumifs with the combination of pandas query and groupby.

Example

import pandas as pd
import numpy as np

age = np.random.randint(85, 100, 50)
balance = np.random.randint(1000, 2500, 50)
nco = np.random.randint(85, 100, 50)

df = pd.DataFrame({'age': age, 'balance': balance, 'nco':nco})


df['reference_age'] = df['age'].apply(lambda x: 5 * round(float(x)/5))

outstanding_balance = (
   df
   .query('age >= reference_age')
   .groupby('reference_age')
   [['balance']]
   .sum()
   .rename(columns={'balance': 'Outstanding Balance'}
   )

cumulative_nco = (
   df
   .query('age < reference_age')
   .groupby('reference_age')
   [['nco']]
   .sum()
   .rename(columns={'nco': 'cumulative nco'})
   .cumsum()
   )


result = outstanding_balance.join(cumulative_sum).T

result

reference_age            85       90       95
Outstanding Balance  2423.0  16350.0  13348.0
cumulative nco          NaN    645.0   1107.0

Upvotes: 0

chuni0r

Reputation: 173

You could try to build groups of loans in a given age range using pd.cut and use groupby afterwards. Something like this:

import pandas as pd

df = pd.DataFrame([[1, 2, 3, 4, 5], [7, 8, 9, 10, 11]], index=['age', 'value']).T
df['groups'] = pd.cut(df.age, [0, 1, 3, 5]) # define bins (0,1], (1,3], (3,5]
df.groupby('groups')['value'].sum()

Upvotes: 0

Conditional Cumulative Sums in Pandas

Answers (3)

Example

result

Related Questions