python pandas...how to compute rolling % performance based on prior threshold performance

Question

I am trying to use python pandas to compute:

10 day and 30 day cumulative % performance in (stock ticker RTH "minus" stock tiker SPY) after certain performance threshold in stock ticker USO occurs (=>10% in a 5-day window)

Here is my code:

import pandas as pd
import datetime
import pandas_datareader.data as web
from pandas import Series, DataFrame

start = datetime.datetime(2012, 4, 1)
end = datetime.datetime.now()

dfcomp = web.DataReader(['USO', 'RTH', 'SPY'],'yahoo',start=start,end=end)['Adj Close']

dfcomp_daily_returns = dfcomp.pct_change()

dfcomp_daily_returns = dfcomp_daily_returns.dropna().copy()

dfcomp_daily_returns.head()

Symbols USO RTH SPY
Date            
2012-04-03  -0.009243   -0.004758   -0.004089
2012-04-04  -0.020676   -0.007411   -0.009911
2012-04-05  0.010814    0.003372    -0.000501
2012-04-09  -0.007387   -0.006961   -0.011231
2012-04-10  -0.011804   -0.018613   -0.016785

excel_sheet11

I added several more rows so it might be easier to work with if someone can help

Symbols USO RTH SPY
Date            
2012-04-03  -0.009243   -0.004758   -0.004089
2012-04-04  -0.020676   -0.007411   -0.009911
2012-04-05  0.010814    0.003372    -0.000501
2012-04-09  -0.007387   -0.006961   -0.011231
2012-04-10  -0.011804   -0.018612   -0.016785
2012-04-11  0.012984    0.010345    0.008095
2012-04-12  0.011023    0.010970    0.013065
2012-04-13  -0.007353   -0.004823   -0.011888
2012-04-16  0.000766    0.004362    -0.000656
2012-04-17  0.011741    0.015440    0.014812
2012-04-18  -0.014884   -0.000951   -0.003379
2012-04-19  -0.002305   -0.006183   -0.006421
2012-04-20  0.011037    0.002632    0.001670
2012-04-23  -0.009139   -0.015513   -0.008409
2012-04-24  0.003587    -0.004364   0.003802

Rawson · Accepted Answer

I think this is a solution to your question. Note that I copied your code up to dropna(), and have also used import numpy as np. You don't need to use from pandas import Series, DataFrame, especially as you have already used import pandas as pd.

The main computations use rolling, apply and where.

# 5-day cumulative %
dfcomp_daily_returns["5_day_cum_%"] = dfcomp_daily_returns["USO"].rolling(5).apply(lambda x: np.prod(1+x)-1)
# RTH - SPY
dfcomp_daily_returns["RTH-SPY"] = dfcomp_daily_returns["RTH"] - dfcomp_daily_returns["SPY"]
# 10-day cumulative %
dfcomp_daily_returns["output_10"] = dfcomp_daily_returns["RTH-SPY"].rolling(10).apply(lambda x: np.prod(1+x)-1).shift(-10).where(dfcomp_daily_returns["5_day_cum_%"] > 0.1, np.nan)
# 30-day cumulative %
dfcomp_daily_returns["output_30"] = dfcomp_daily_returns["RTH-SPY"].rolling(30).apply(lambda x: np.prod(1+x)-1).shift(-30).where(dfcomp_daily_returns["5_day_cum_%"] > 0.1, np.nan)

I won't print the output, given that there are thousands of rows, and the occurrences of ["5_day_cum_%"] > 0.1 are irregular.

How this code works:

The 5_day_cum_% is calculated using a rolling 5-day window, with the product of the values in this window.
RTH-SPY is column RTH "minus" column SPY.
The output calculates the rolling product of RTH-SPY, then using .shift() for forward rolling (it is not possible to use .rolling() to roll forwards. This idea came from Daniel Manso here. Finally, .where() is used to only keep these values on the condition that [5_day_cum_%] > 0.1 (or 10%), returning np.nan otherwise.

Additions from comments

From your additions in the comments, here are two options for each of those (one using pd.where again, the other just using standard pandas filtering (I'm not sure if it has an actual name). In both, the standard filtering is shorter.

A list of all the dates:

# Option 1: pd.where    
list(dfcomp_daily_returns.where(dfcomp_daily_returns["5_day_cum_%"] > 0.1, np.nan).dropna(subset=["5_day_cum_%"]).index)
# Option 2: standard pandas filtering
list(dfcomp_daily_returns[dfcomp_daily_returns["5_day_cum_%"] > 0.1].index)

A dataframe of only those with 5-day return greater than 10%:

# Option 1: pd.where
dfcomp_daily_returns.where(dfcomp_daily_returns["5_day_cum_%"] > 0.1, np.nan).dropna(subset=["5_day_cum_%"])[["5_day_cum_%", "output_10", "output_30"]]
# Option 2: standard pandas row filtering
dfcomp_daily_returns[dfcomp_daily_returns["5_day_cum_%"] > 0.1][["5_day_cum_%", "output_10", "output_30"]]

python pandas...how to compute rolling % performance based on prior threshold performance

Answers (1)

Related Questions