Ralf Klüber
Ralf Klüber

Reputation: 61

How to vectorize a cumulative operation in Pandas

Based on the answer to How to vectorize an operation that uses previous values?, I am not able to answer the following question I have:

Is there a way to vectorize the Value End Of Period (VEoP) column?

import pandas as pd

terms = pd.date_range(start = '2022-01-01', periods=12, freq='YS', normalize=True)
df = pd.DataFrame({
    'Return':   [1.063, 1.053, 1.008, 0.98, 1.04, 1.057, 1.073, 1.027, 1.025, 1.068, 1.001, 0.983],
    'Cashflow': [6, 0, 0, 8, -1, -1, -1, -1, -1, -1, -1, -1]
    },index=terms.strftime('%Y'))
df.index.name = 'Date'

df['VEoP'] = 0
for y in range(0, df.index.size):
    df['VEoP'].iloc[y] = ((0 if y==0 else df['VEoP'].iloc[y-1]) + df['Cashflow'].iloc[y]) * df['Return'].iloc[y]

df

    Return  Cashflow    VEoP
Date                          
2022  1.0630         6  6.3780
2023  1.0530         0  6.7160
2024  1.0080         0  6.7698
2025  0.9800         8 14.4744
2026  1.0400        -1 14.0133
2027  1.0570        -1 13.7551
2028  1.0730        -1 13.6862
2029  1.0270        -1 13.0288
2030  1.0250        -1 12.3295
2031  1.0680        -1 12.0999
2032  1.0010        -1 11.1110
2033  0.9830        -1  9.9391

Upvotes: 6

Views: 332

Answers (1)

EliadL
EliadL

Reputation: 7068

Vectorization is limited when each value relies on the one before it, since it can't be parallelized.

Therefore a non-vectorized solution with accumulate:

df['VEoP'] = list(accumulate(
    df.to_records(),
    lambda prev_veop, new: (prev_veop + new.Cashflow) * new.Return,
    initial=0,
))[1:]

performs just as well as this numpy "vectorization":

df['VEoP'] = np.frompyfunc(
    lambda prev_veop, new: (prev_veop + new.Cashflow) * new.Return,
    2, 1,  # nin, nout
).accumulate(
    [0, *df.to_records()],
    dtype=object,  # temporary conversion
).astype(float)[1:]

which can be broken down into smaller bites of logic:

def get_ufunc(func, nin, nout):  return np.frompyfunc(func, nin, nout)
def get_binary_ufunc(func):      return get_ufunc(func, nin=2, nout=1)
def accum(func):                 return get_binary_ufunc(func).accumulate
def accum_float(func, x):        return accum(func)(x, dtype=object).astype(float)
def accum_float_from_0(func, x): return accum_float(func, [0, *x])[1:]

def calc_veop(prev_veop, new):   return (prev_veop + new.Cashflow) * new.Return
def accum_veop(records):         return accum_float_from_0(calc_veop, records)

df['VEoP'] = accum_veop(df.to_records())

You can read more about np.frompyfunc and np.ufunc.accumulate.

Upvotes: 3

Related Questions