Reputation: 85
I think that my problem is best illustrated with an example.
The following are the columns I have as inputs.
Month Monthly_Interest Payment_Shortfall Amount_In_Arrears
1 0.1 20
2 0.1 100
3 0.1 105
4 0.1 50
5 0.1 5
6 0.1 30
7 0.1 70
This is the output that I need.
Month Monthly_Interest Payment_Shortfall Amount_In_Arrears
1 0.1 20 20.00
2 0.1 100 122.00
3 0.1 105 239.20
4 0.1 50 313.12
5 0.1 5 349.43
6 0.1 30 414.38
7 0.1 70 525.81
This is a loan which is being underpaid each month by the Payment_Shortfall
.
The Payment_Shortfall
builds up each month to form the Amount_In_Arrears
.
Interest is charged on the Amount_In_Arrears
as at the previous month and is added to the current month's Amount_In_Arrears
.
In other words: Amount_In_Arrears = Payment_Shortfall + Amount_In_Arrears(previous month) + Amount_In_Arrears(previous month) * Monthly_Interest
For example, for month 2: Amount_in_Arrears = 100 + 20 + 20 * 0.1 = 122
and for month 3: Amount_in_Arrears = 105 + 122 + 122 * 0.1 = 239.20
I have been trying to do this in Pandas and I just can't seem to figure it out. I will need to apply it to a fairly large dataframe (about 3 million rows), so efficiency is a consideration.
My data have a number of loans like this, so I will also need the solution to reset (probably using the groupby() function), however, I think it's best to deal with one issue at a time.
Many thanks in advance for any help!
Upvotes: 2
Views: 133
Reputation: 2110
As far as i know there is no efficient way to calculate this in plain pandas(But if there is, I'd very much like to know). I know of two options to get fast computations in python for this kind of task, cython (specially annotated python that is compiled to c) and numba (a just in time compiler for plain python).
Since i don't know much about cython I'll give you the example for numba.
For numba you write a separate function with a numba decorater. Numba plays nice with numpy but doesn't know pandas, therefore we have to feed numpy arrays into the function:
@numba.jit(nopython=True)
def compute_total_amount(interest, amount):
n = len(interest)
total_amount = np.empty(n)
total_amount[0] = amount[0]
for i in range(1,n):
total_amount[i] = total_amount[i-1]*(1+interest[i])+amount[i]
return total_amount
df['Amount_In_Arrears'] = compute_total_amount(df['Monthly_Interest'].values,\
df['Payment_Shortfall'].values)
Payment_Shortfall Monthly_Interest Amount_In_Arrears
0 20 0.1 20.00000
1 100 0.1 122.00000
2 105 0.1 239.20000
3 50 0.1 313.12000
4 5 0.1 349.43200
5 30 0.1 414.37520
6 70 0.1 525.81272
EDIT: Added timings for numba vs plain python (1m rows)
df= pd.DataFrame({'Payment_Shortfall':np.random.random(10**6)*10}))
df['Monthly_Interest'] = -0.01
def compute_total_amount_python(interest, amount):
n = len(interest)
total_amount = np.empty(n)
total_amount[0] = amount[0]
for i in range(1,n):
total_amount[i] = total_amount[i-1]*(1+interest[i])+amount[i]
return total_amount
%timeit compute_total_amount_python(df['Monthly_Interest'].values, df['Payment_Shortfall'].values)
729 ms ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit compute_total_amount(df['Monthly_Interest'].values, df['Payment_Shortfall'].values)
2.7 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 1