Olga
Olga

Reputation: 13

How to replace slow 'apply' method in pandas DataFrame

I have a DataFrame with currencies transactions:

import pandas as pd

data = [[1653663281618, -583.8686, 'USD'],
       [1653741652125, -84.0381, 'USD'],
       [1653776860252, -33.8723, 'CHF'],
       [1653845294504, -465.4614, 'CHF'],
       [1653847155140, 22.285, 'USD'],
       [1653993629537, -358.04640000000006, 'USD']]

df = pd.DataFrame(data = data, columns = ['time', 'qty', 'currency_1'])

I need to add new column "balance" which would calculate the sum of the column 'qty' for all previous transactions. I have a simple function:

def balance(row):
    table = df[(df['time'] < row['time']) & (df['currency_1'] == row['currency_1'])]
    return table['qty'].sum()

df['balance'] = df.apply(balance, axis = 1)

But my real DataFrame is very large and .apply method works extremely slow. Is it a way to avoid using apply function in this case? Something like np.where?

Upvotes: 0

Views: 108

Answers (1)

Rabinzel
Rabinzel

Reputation: 7923

You could just use pandas cumsum here:

EDIT After adding a condition: I don't know how transform performs compared to apply, I'd say just try it on your real data. Can't think of an easier solution for the moment.

df['balance'] = df.groupby('currency_1')['qty'].transform(lambda x: x.shift().cumsum())
print(df)

            time       qty currency_1   balance
0  1653663281618 -583.8686        USD       NaN
1  1653741652125  -84.0381        USD -583.8686
2  1653776860252  -33.8723        CHF       NaN
3  1653845294504 -465.4614        CHF  -33.8723
4  1653847155140   22.2850        USD -667.9067
5  1653993629537 -358.0464        USD -645.6217

old answer:

df['Balance'] = df['qty'].shift(fill_value=0).cumsum()
print(df)

            time       qty currency_1    Balance
0  1653663281618 -583.8686        USD     0.0000
1  1653741652125  -84.0381        USD  -583.8686
2  1653776860252  -33.8723        USD  -667.9067
3  1653845294504 -465.4614        USD  -701.7790
4  1653847155140   22.2850        USD -1167.2404
5  1653993629537 -358.0464        USD -1144.9554

Upvotes: 1

Related Questions