LLTeng
LLTeng

Reputation: 395

Apply Customize Cumulative Function to Pandas

I have the following dataframe and like to apply a customize the cumulative formulas to the column.

How should I put them in a function and apply them to a new column? Thank you very much!

Excel Formula:

new column =(previous period value + 1) * (1 + current period of test_A's value) ^ (1/12) -1

# For example: 2021-12-11 as current period 
=(0.01 + 1 ) * (1 + 0.01 ) ^ (1/12) -1 = 0.0016598

enter image description here

Upvotes: 0

Views: 456

Answers (2)

jezrael
jezrael

Reputation: 862851

I think here is best use numba, if performance is important:

df = pd.DataFrame({
    "test_A": [0.01] * 1000,
   
})

    
from numba import jit

@jit(nopython=True)
def f(a):
    d = np.empty(a.shape)
    p = 0
    for i in range(a.shape[0]):
        p = (1 + p) * (1 + a[i]) ** (1/12) - 1
        d[i] = p
    return d


df['Cumulative1'] = f(df['test_A'].to_numpy())
print (df)
     test_A  Cumulative  Cumulative1
0      0.01    0.000830     0.000830
1      0.01    0.001660     0.001660
2      0.01    0.002491     0.002491
3      0.01    0.003322     0.003322
4      0.01    0.004155     0.004155
..      ...         ...          ...
995    0.01    1.283884     1.283884
996    0.01    1.285778     1.285778
997    0.01    1.287675     1.287675
998    0.01    1.289572     1.289572
999    0.01    1.291472     1.291472

[1000 rows x 3 columns]

In [46]: %%timeit
    ...: df['Cumulative'] = np.nan  # create a new column
    ...: prev = 0
    ...: for i, row in df.iterrows():
    ...:     prev = (1 + prev) * (1 + row['test_A']) ** (1/12) - 1
    ...:     df.loc[i, 'Cumulative'] = prev
    ...:     
140 ms ± 998 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [47]: %%timeit 
    ...: df['Cumulative1'] = f(df['test_A'].to_numpy())
    ...: 
133 µs ± 570 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Upvotes: 1

Kate Melnykova
Kate Melnykova

Reputation: 1873

I would achieve this by iterating over rows.

df['Cumulative'] = np.nan  # create a new column
prev = 0
for i, row in df.iterrows():
    prev = (1 + prev) * (1 + row['test_A']) ** (1/12) - 1
    df.loc[i, 'Cumulative'] = prev

Upvotes: 1

Related Questions