Reputation: 395
I have the following dataframe and like to apply a customize the cumulative formulas to the column.
How should I put them in a function and apply them to a new column? Thank you very much!
Excel Formula:
new column =(previous period value + 1) * (1 + current period of test_A's value) ^ (1/12) -1
# For example: 2021-12-11 as current period
=(0.01 + 1 ) * (1 + 0.01 ) ^ (1/12) -1 = 0.0016598
Upvotes: 0
Views: 456
Reputation: 862851
I think here is best use numba, if performance is important:
df = pd.DataFrame({
"test_A": [0.01] * 1000,
})
from numba import jit
@jit(nopython=True)
def f(a):
d = np.empty(a.shape)
p = 0
for i in range(a.shape[0]):
p = (1 + p) * (1 + a[i]) ** (1/12) - 1
d[i] = p
return d
df['Cumulative1'] = f(df['test_A'].to_numpy())
print (df)
test_A Cumulative Cumulative1
0 0.01 0.000830 0.000830
1 0.01 0.001660 0.001660
2 0.01 0.002491 0.002491
3 0.01 0.003322 0.003322
4 0.01 0.004155 0.004155
.. ... ... ...
995 0.01 1.283884 1.283884
996 0.01 1.285778 1.285778
997 0.01 1.287675 1.287675
998 0.01 1.289572 1.289572
999 0.01 1.291472 1.291472
[1000 rows x 3 columns]
In [46]: %%timeit
...: df['Cumulative'] = np.nan # create a new column
...: prev = 0
...: for i, row in df.iterrows():
...: prev = (1 + prev) * (1 + row['test_A']) ** (1/12) - 1
...: df.loc[i, 'Cumulative'] = prev
...:
140 ms ± 998 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [47]: %%timeit
...: df['Cumulative1'] = f(df['test_A'].to_numpy())
...:
133 µs ± 570 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 1
Reputation: 1873
I would achieve this by iterating over rows.
df['Cumulative'] = np.nan # create a new column
prev = 0
for i, row in df.iterrows():
prev = (1 + prev) * (1 + row['test_A']) ** (1/12) - 1
df.loc[i, 'Cumulative'] = prev
Upvotes: 1