How to duplicate row in Pandas DataFrame

Question

I am seeking for pythonic way to deal with Pandas DataFrame. Suppose I have DataFrame looks like below:

Account	Stage	Outstanding	Installment	EIR
A	1	10000	100	0.07
B	2	50000	500	0.04
C	3	10000	100	0.07

I am trying to do the amortization table from the given information by stage. For example:

Account A Stage 1 will be amortized for 12 months
Account B Stage 2 will be amortized until Outstanding = 0 (or close to 0)
Account C Stage 3 will NOT be amortized

I have SAS Code to perform such a logic explained earlier per below:

data want;
set have;

if Stage = 1 then do;
    do Term = 1 to 12;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 2 then do;
    do Term = 1 to Term;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 3 then do;
    Outstanding = Outstanding;
    output;
    end;

run;

After run the code will provide output table looks like below (the number is just mock-up):

Account	Stage	Outstanding	Installment	EIR	Term
A	1	10000	100	0.07	1
A	1	9000	100	0.07	2
A	1	8000	100	0.07	3
A	1	...	...	...	...
A	1	2000	100	0.07	12
B	2	50000	500	0.04	1
B	2	49000	500	0.04	2
B	2	48000	500	0.04	3
B	2	...	...	...	...
B	2	125	500	0.04	48
C	3	10000	100	0.07	1

I have the python code to the same but I think it is not efficient. I have somethings like:

# Amortization function
def balances(rate, payment, os):
    interestAmount = os * rate / 100 / 12   
    nextBalance = os + interestAmount - payment
    return nextBalance

Then, I used the for-loop to call the function account by account and used np.repeat() to repeat the information I needed.

result = []
for i, account in enumerate(df['Account']):
    if i % 5000 == 0:
        print(f'Calcultion account: {i}')
    accountTable = df[df['Account'] == account]
    rate = float(accountTable['EIR'])
    payment = float(accountTable['Installment'])
    amount = float(accountTable['Outstanding'])

    if int(accountTable['Stage']) <= 2:
        while amount > 0:
            amount = balances(rate, payment, amount)
            amortization.append(amount)
            if amortization[-1] <= 0:
                amortization.pop(-1)
            amortizationTable = pd.DataFrame(np.repeat(accountTable.values, len(amortization), axis = 0), columns = accountTable.columns)
            amortizationTable['Outstanding'] = amortization
            amortizationTable['Term'] = amortizationTable.index + 1
            result.append(amortizationTable)

I found it extremely slow compared to SAS Programming. Any suggestion to improve the speed or make it more pythonic way.

Thank you.

How to duplicate row in Pandas DataFrame

Answers (1)

Related Questions