Create a column in Pandas by summing and multiplying the previous value in the column

Question

I think that my problem is best illustrated with an example.

The following are the columns I have as inputs.

Month       Monthly_Interest    Payment_Shortfall      Amount_In_Arrears
1               0.1             20                     
2               0.1             100                    
3               0.1             105                    
4               0.1             50                     
5               0.1             5                      
6               0.1             30                     
7               0.1             70

This is the output that I need.

Month       Monthly_Interest    Payment_Shortfall      Amount_In_Arrears
1               0.1             20                     20.00
2               0.1             100                    122.00
3               0.1             105                    239.20
4               0.1             50                     313.12
5               0.1             5                      349.43
6               0.1             30                     414.38
7               0.1             70                     525.81

This is a loan which is being underpaid each month by the Payment_Shortfall.

The Payment_Shortfall builds up each month to form the Amount_In_Arrears.

Interest is charged on the Amount_In_Arrears as at the previous month and is added to the current month's Amount_In_Arrears.

In other words: Amount_In_Arrears = Payment_Shortfall + Amount_In_Arrears(previous month) + Amount_In_Arrears(previous month) * Monthly_Interest

For example, for month 2: Amount_in_Arrears = 100 + 20 + 20 * 0.1 = 122 and for month 3: Amount_in_Arrears = 105 + 122 + 122 * 0.1 = 239.20

I have been trying to do this in Pandas and I just can't seem to figure it out. I will need to apply it to a fairly large dataframe (about 3 million rows), so efficiency is a consideration.

My data have a number of loans like this, so I will also need the solution to reset (probably using the groupby() function), however, I think it's best to deal with one issue at a time.

Many thanks in advance for any help!

P.Tillmann · Accepted Answer

As far as i know there is no efficient way to calculate this in plain pandas(But if there is, I'd very much like to know). I know of two options to get fast computations in python for this kind of task, cython (specially annotated python that is compiled to c) and numba (a just in time compiler for plain python).

Since i don't know much about cython I'll give you the example for numba.

For numba you write a separate function with a numba decorater. Numba plays nice with numpy but doesn't know pandas, therefore we have to feed numpy arrays into the function:

@numba.jit(nopython=True)
def compute_total_amount(interest, amount):
  n = len(interest)
  total_amount = np.empty(n)
  total_amount[0] = amount[0]
  for i in range(1,n):
    total_amount[i] = total_amount[i-1]*(1+interest[i])+amount[i]
  return total_amount

df['Amount_In_Arrears'] = compute_total_amount(df['Monthly_Interest'].values,\
                                               df['Payment_Shortfall'].values)

   Payment_Shortfall  Monthly_Interest  Amount_In_Arrears
0                 20               0.1           20.00000
1                100               0.1          122.00000
2                105               0.1          239.20000
3                 50               0.1          313.12000
4                  5               0.1          349.43200
5                 30               0.1          414.37520
6                 70               0.1          525.81272

EDIT: Added timings for numba vs plain python (1m rows)

df= pd.DataFrame({'Payment_Shortfall':np.random.random(10**6)*10}))
df['Monthly_Interest'] = -0.01

def compute_total_amount_python(interest, amount):
  n = len(interest)
  total_amount = np.empty(n)
  total_amount[0] = amount[0]
  for i in range(1,n):
    total_amount[i] = total_amount[i-1]*(1+interest[i])+amount[i]
  return total_amount

%timeit compute_total_amount_python(df['Monthly_Interest'].values, df['Payment_Shortfall'].values)
729 ms ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit compute_total_amount(df['Monthly_Interest'].values, df['Payment_Shortfall'].values)
2.7 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Create a column in Pandas by summing and multiplying the previous value in the column

Answers (1)

Related Questions