Reputation: 193
I am seeking for pythonic way to deal with Pandas DataFrame. Suppose I have DataFrame looks like below:
Account | Stage | Outstanding | Installment | EIR |
---|---|---|---|---|
A | 1 | 10000 | 100 | 0.07 |
B | 2 | 50000 | 500 | 0.04 |
C | 3 | 10000 | 100 | 0.07 |
I am trying to do the amortization table from the given information by stage. For example:
Account A Stage 1 will be amortized for 12 months
Account B Stage 2 will be amortized until Outstanding = 0 (or close to 0)
Account C Stage 3 will NOT be amortized
I have SAS Code to perform such a logic explained earlier per below:
data want;
set have;
if Stage = 1 then do;
do Term = 1 to 12;
Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
if Outstanding < 0 then delete;
output;
end;
end;
else if Stage = 2 then do;
do Term = 1 to Term;
Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
if Outstanding < 0 then delete;
output;
end;
end;
else if Stage = 3 then do;
Outstanding = Outstanding;
output;
end;
run;
After run the code will provide output table looks like below (the number is just mock-up):
Account | Stage | Outstanding | Installment | EIR | Term |
---|---|---|---|---|---|
A | 1 | 10000 | 100 | 0.07 | 1 |
A | 1 | 9000 | 100 | 0.07 | 2 |
A | 1 | 8000 | 100 | 0.07 | 3 |
A | 1 | ... | ... | ... | ... |
A | 1 | 2000 | 100 | 0.07 | 12 |
B | 2 | 50000 | 500 | 0.04 | 1 |
B | 2 | 49000 | 500 | 0.04 | 2 |
B | 2 | 48000 | 500 | 0.04 | 3 |
B | 2 | ... | ... | ... | ... |
B | 2 | 125 | 500 | 0.04 | 48 |
C | 3 | 10000 | 100 | 0.07 | 1 |
I have the python code to the same but I think it is not efficient. I have somethings like:
# Amortization function
def balances(rate, payment, os):
interestAmount = os * rate / 100 / 12
nextBalance = os + interestAmount - payment
return nextBalance
Then, I used the for-loop to call the function account by account and used np.repeat()
to repeat the information I needed.
result = []
for i, account in enumerate(df['Account']):
if i % 5000 == 0:
print(f'Calcultion account: {i}')
accountTable = df[df['Account'] == account]
rate = float(accountTable['EIR'])
payment = float(accountTable['Installment'])
amount = float(accountTable['Outstanding'])
if int(accountTable['Stage']) <= 2:
while amount > 0:
amount = balances(rate, payment, amount)
amortization.append(amount)
if amortization[-1] <= 0:
amortization.pop(-1)
amortizationTable = pd.DataFrame(np.repeat(accountTable.values, len(amortization), axis = 0), columns = accountTable.columns)
amortizationTable['Outstanding'] = amortization
amortizationTable['Term'] = amortizationTable.index + 1
result.append(amortizationTable)
I found it extremely slow compared to SAS Programming. Any suggestion to improve the speed or make it more pythonic way.
Thank you.
Upvotes: -1
Views: 191
Reputation: 630
Try this:
import pandas as pd
df = pd.DataFrame({'acc': ['a', 'b', 'c'],
'stage': [1, 2, 3],
'bal': [10000, 50000, 10000],
'installment': [100, 500, 100],
'eir': [0.07, 0.04, 0.07],
})
def computeBal(bal, eir, installment):
intt = bal * eir / 12 / 100
next_bal = bal + intt - installment
return next_bal
def processAccount(df_acc):
acc = df_acc['acc'].values[0]
stg = int(df_acc['stage'])
bal = float(df_acc['bal'])
eir = float(df_acc['eir'])
installment = float(df_acc['installment'])
amort = []
amort.append(bal)
if stg == 1:
for _ in range(1, 12):
bal = computeBal(bal, eir, installment)
amort.append(round(bal, 2))
elif stg == 2:
while bal > 0:
bal = computeBal(bal, eir, installment)
if bal > 0:
amort.append(round(bal, 2))
out = pd.DataFrame(amort)
out['acc'] = acc
out['stage'] = stg
out['installment'] = installment
out['eir'] = eir
out.reset_index(inplace=True)
out.rename(columns={0: 'bal', 'index': 'term'}, inplace=True)
out['term'] += 1
return out[['acc', 'stage', 'bal', 'installment', 'eir', 'term']]
result = dict()
for acc in df['acc'].unique():
df_acc = df.loc[df['acc'] == acc, :].copy()
result[acc] = processAccount(df_acc)
out = pd.concat(result).reset_index(drop=True)
out
Upvotes: -1