Sasiwut Chaiyadecha
Sasiwut Chaiyadecha

Reputation: 193

How to duplicate row in Pandas DataFrame

I am seeking for pythonic way to deal with Pandas DataFrame. Suppose I have DataFrame looks like below:

Account Stage Outstanding Installment EIR
A 1 10000 100 0.07
B 2 50000 500 0.04
C 3 10000 100 0.07

I am trying to do the amortization table from the given information by stage. For example:

Account A Stage 1 will be amortized for 12 months
Account B Stage 2 will be amortized until Outstanding = 0 (or close to 0)
Account C Stage 3 will NOT be amortized

I have SAS Code to perform such a logic explained earlier per below:

data want;
set have;

if Stage = 1 then do;
    do Term = 1 to 12;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 2 then do;
    do Term = 1 to Term;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 3 then do;
    Outstanding = Outstanding;
    output;
    end;

run;

After run the code will provide output table looks like below (the number is just mock-up):

Account Stage Outstanding Installment EIR Term
A 1 10000 100 0.07 1
A 1 9000 100 0.07 2
A 1 8000 100 0.07 3
A 1 ... ... ... ...
A 1 2000 100 0.07 12
B 2 50000 500 0.04 1
B 2 49000 500 0.04 2
B 2 48000 500 0.04 3
B 2 ... ... ... ...
B 2 125 500 0.04 48
C 3 10000 100 0.07 1

I have the python code to the same but I think it is not efficient. I have somethings like:

# Amortization function
def balances(rate, payment, os):
    interestAmount = os * rate / 100 / 12   
    nextBalance = os + interestAmount - payment
    return nextBalance

Then, I used the for-loop to call the function account by account and used np.repeat() to repeat the information I needed.

result = []
for i, account in enumerate(df['Account']):
    if i % 5000 == 0:
        print(f'Calcultion account: {i}')
    accountTable = df[df['Account'] == account]
    rate = float(accountTable['EIR'])
    payment = float(accountTable['Installment'])
    amount = float(accountTable['Outstanding'])

    if int(accountTable['Stage']) <= 2:
        while amount > 0:
            amount = balances(rate, payment, amount)
            amortization.append(amount)
            if amortization[-1] <= 0:
                amortization.pop(-1)
            amortizationTable = pd.DataFrame(np.repeat(accountTable.values, len(amortization), axis = 0), columns = accountTable.columns)
            amortizationTable['Outstanding'] = amortization
            amortizationTable['Term'] = amortizationTable.index + 1
            result.append(amortizationTable)

I found it extremely slow compared to SAS Programming. Any suggestion to improve the speed or make it more pythonic way.

Thank you.

Upvotes: -1

Views: 191

Answers (1)

Abhishek Jain
Abhishek Jain

Reputation: 630

Try this:

import pandas as pd
df = pd.DataFrame({'acc': ['a', 'b', 'c'],
                   'stage': [1, 2, 3],
                   'bal': [10000, 50000, 10000],
                   'installment': [100, 500, 100],
                   'eir': [0.07, 0.04, 0.07],
                   })


def computeBal(bal, eir, installment):
    intt = bal * eir / 12 / 100
    next_bal = bal + intt - installment
    return next_bal


def processAccount(df_acc):
    acc = df_acc['acc'].values[0]
    stg = int(df_acc['stage'])
    bal = float(df_acc['bal'])
    eir = float(df_acc['eir'])
    installment = float(df_acc['installment'])

    amort = []
    amort.append(bal)
    if stg == 1:
        for _ in range(1, 12):
            bal = computeBal(bal, eir, installment)
            amort.append(round(bal, 2))
    elif stg == 2:
        while bal > 0:
            bal = computeBal(bal, eir, installment)
            if bal > 0:
                amort.append(round(bal, 2))

    out = pd.DataFrame(amort)
    out['acc'] = acc
    out['stage'] = stg
    out['installment'] = installment
    out['eir'] = eir
    out.reset_index(inplace=True)
    out.rename(columns={0: 'bal', 'index': 'term'}, inplace=True)
    out['term'] += 1

    return out[['acc', 'stage', 'bal', 'installment', 'eir', 'term']]


result = dict()
for acc in df['acc'].unique():
    df_acc = df.loc[df['acc'] == acc, :].copy()
    result[acc] = processAccount(df_acc)


out = pd.concat(result).reset_index(drop=True)
out

Upvotes: -1

Related Questions