Reputation: 13
I have a DataFrame with currencies transactions:
import pandas as pd
data = [[1653663281618, -583.8686, 'USD'],
[1653741652125, -84.0381, 'USD'],
[1653776860252, -33.8723, 'CHF'],
[1653845294504, -465.4614, 'CHF'],
[1653847155140, 22.285, 'USD'],
[1653993629537, -358.04640000000006, 'USD']]
df = pd.DataFrame(data = data, columns = ['time', 'qty', 'currency_1'])
I need to add new column "balance" which would calculate the sum of the column 'qty' for all previous transactions. I have a simple function:
def balance(row):
table = df[(df['time'] < row['time']) & (df['currency_1'] == row['currency_1'])]
return table['qty'].sum()
df['balance'] = df.apply(balance, axis = 1)
But my real DataFrame is very large and .apply method works extremely slow. Is it a way to avoid using apply function in this case? Something like np.where?
Upvotes: 0
Views: 108
Reputation: 7923
You could just use pandas cumsum
here:
EDIT
After adding a condition:
I don't know how transform
performs compared to apply
, I'd say just try it on your real data. Can't think of an easier solution for the moment.
df['balance'] = df.groupby('currency_1')['qty'].transform(lambda x: x.shift().cumsum())
print(df)
time qty currency_1 balance
0 1653663281618 -583.8686 USD NaN
1 1653741652125 -84.0381 USD -583.8686
2 1653776860252 -33.8723 CHF NaN
3 1653845294504 -465.4614 CHF -33.8723
4 1653847155140 22.2850 USD -667.9067
5 1653993629537 -358.0464 USD -645.6217
old answer:
df['Balance'] = df['qty'].shift(fill_value=0).cumsum()
print(df)
time qty currency_1 Balance
0 1653663281618 -583.8686 USD 0.0000
1 1653741652125 -84.0381 USD -583.8686
2 1653776860252 -33.8723 USD -667.9067
3 1653845294504 -465.4614 USD -701.7790
4 1653847155140 22.2850 USD -1167.2404
5 1653993629537 -358.0464 USD -1144.9554
Upvotes: 1