fjurt
fjurt

Reputation: 793

Calculate cumulative return over x days from defined start point in DataFrame

I have a return DataFrame with daily return data and a start DataFrame which defines the start dates for calculating the cumulative returns over x days.

returns:
            01K     02K     03K     04K
Dates               
2021-01-01  1.02    1.05    1.02    1.01
2021-01-02  0.97    0.99    1.02    1.06
2021-01-03  1.03    1.07    0.98    1.02
2021-01-04  0.96    1.02    1.03    0.98
2021-01-05  1.02    1.02    1.09    1.03
2021-01-06  1.06    0.95    0.96    0.99

start:
            01K 02K 03K 04K
Dates               
2021-01-01  1   0   0   0
2021-01-02  0   1   0   0
2021-01-03  0   0   0   0
2021-01-04  1   0   1   0
2021-01-05  0   0   1   0
2021-01-06  0   0   0   1

I try to calculate the cumulative return (cum_returns) from the start day until two days later and I want to assign that value to the location of the start day in the DataFrame. For example:

At the end of the DataFrame which no return information for all three days, I try to get an NaN value instead of the cumulative return until that date. This is what I try to get:

cum_returns:
            01K     02K     03K     04K
Dates               
2021-01-01  1.019   0.000   0.000   0.000
2021-01-02  0.000   1.080   0.000   0.000
2021-01-03  0.000   0.000   0.000   0.000
2021-01-04  1.038   0.000   1.078   0.000
2021-01-05  0.000   0.000   NaN     0.000
2021-01-06  0.000   0.000   0.000   NaN

For reproducibility:

import pandas as pd
import numpy as np

returns = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
    '01K':[0.02, -0.03, 0.03, -0.04, 0.02, 0.06], 
    '02K':[0.05, -0.01, 0.07, 0.02, 0.02, -0.05], 
    '03K':[0.02, 0.02, -0.02, 0.03, 0.09, -0.04], 
    '04K':[0.01, 0.06, 0.02, -0.02, 0.03, -0.01]}) 
returns = returns.set_index('Dates')
returns = returns + 1

start = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
    '01K':[1, 0, 0, 1, 0, 0], 
    '02K':[0, 1, 0, 0, 0, 0], 
    '03K':[0, 0, 0, 1, 1, 0], 
    '04K':[0, 0, 0, 0, 0, 1]}) 
start = start.set_index('Dates')

Thank you very much

Upvotes: 0

Views: 219

Answers (1)

Paul
Paul

Reputation: 1887

Try:

returns.shift(-2) * returns.shift(-1) * returns * start

This returns:

            01K         02K         03K         04K
Dates               
2021-01-01  1.019082    0.000000    0.000000    0.0
2021-01-02  0.000000    1.080486    0.000000    0.0
2021-01-03  0.000000    0.000000    0.000000    0.0
2021-01-04  1.037952    0.000000    1.077792    0.0
2021-01-05  NaN         NaN         NaN         NaN
2021-01-06  NaN         NaN         NaN         NaN

UPDATE

If you want less decimals, simply add round:

(returns.shift(-2) * returns.shift(-1) * returns * start).round(3)

            01K     02K     03K     04K
Dates               
2021-01-01  1.019   0.000   0.000   0.0
2021-01-02  0.000   1.080   0.000   0.0
2021-01-03  0.000   0.000   0.000   0.0
2021-01-04  1.038   0.000   1.078   0.0
2021-01-05  NaN     NaN     NaN     NaN
2021-01-06  NaN     NaN     NaN     NaN

UGLY UPDATE

This is not the prettiest or most pythonic solution, but I don't have the headspace right now to present you a better one, perhaps in a bit. This should work though:

for i in range(3):
    if i == 0:
        df = returns.copy()
    else:
        df = returns.shift(-i) * df
(df * start).round(3)

You can change range(3) to range(20) in your case.

Upvotes: 2

Related Questions