Reputation: 11
I am trying to use python pandas to compute:
10 day and 30 day cumulative % performance in (stock ticker RTH "minus" stock tiker SPY) after certain performance threshold in stock ticker USO occurs (=>10% in a 5-day window)
Here is my code:
import pandas as pd
import datetime
import pandas_datareader.data as web
from pandas import Series, DataFrame
start = datetime.datetime(2012, 4, 1)
end = datetime.datetime.now()
dfcomp = web.DataReader(['USO', 'RTH', 'SPY'],'yahoo',start=start,end=end)['Adj Close']
dfcomp_daily_returns = dfcomp.pct_change()
dfcomp_daily_returns = dfcomp_daily_returns.dropna().copy()
dfcomp_daily_returns.head()
Symbols USO RTH SPY
Date
2012-04-03 -0.009243 -0.004758 -0.004089
2012-04-04 -0.020676 -0.007411 -0.009911
2012-04-05 0.010814 0.003372 -0.000501
2012-04-09 -0.007387 -0.006961 -0.011231
2012-04-10 -0.011804 -0.018613 -0.016785
I added several more rows so it might be easier to work with if someone can help
Symbols USO RTH SPY
Date
2012-04-03 -0.009243 -0.004758 -0.004089
2012-04-04 -0.020676 -0.007411 -0.009911
2012-04-05 0.010814 0.003372 -0.000501
2012-04-09 -0.007387 -0.006961 -0.011231
2012-04-10 -0.011804 -0.018612 -0.016785
2012-04-11 0.012984 0.010345 0.008095
2012-04-12 0.011023 0.010970 0.013065
2012-04-13 -0.007353 -0.004823 -0.011888
2012-04-16 0.000766 0.004362 -0.000656
2012-04-17 0.011741 0.015440 0.014812
2012-04-18 -0.014884 -0.000951 -0.003379
2012-04-19 -0.002305 -0.006183 -0.006421
2012-04-20 0.011037 0.002632 0.001670
2012-04-23 -0.009139 -0.015513 -0.008409
2012-04-24 0.003587 -0.004364 0.003802
Upvotes: 0
Views: 294
Reputation: 2822
I think this is a solution to your question. Note that I copied your code up to dropna()
, and have also used import numpy as np
. You don't need to use from pandas import Series, DataFrame
, especially as you have already used import pandas as pd
.
The main computations use rolling, apply and where.
# 5-day cumulative %
dfcomp_daily_returns["5_day_cum_%"] = dfcomp_daily_returns["USO"].rolling(5).apply(lambda x: np.prod(1+x)-1)
# RTH - SPY
dfcomp_daily_returns["RTH-SPY"] = dfcomp_daily_returns["RTH"] - dfcomp_daily_returns["SPY"]
# 10-day cumulative %
dfcomp_daily_returns["output_10"] = dfcomp_daily_returns["RTH-SPY"].rolling(10).apply(lambda x: np.prod(1+x)-1).shift(-10).where(dfcomp_daily_returns["5_day_cum_%"] > 0.1, np.nan)
# 30-day cumulative %
dfcomp_daily_returns["output_30"] = dfcomp_daily_returns["RTH-SPY"].rolling(30).apply(lambda x: np.prod(1+x)-1).shift(-30).where(dfcomp_daily_returns["5_day_cum_%"] > 0.1, np.nan)
I won't print the output, given that there are thousands of rows, and the occurrences of ["5_day_cum_%"] > 0.1
are irregular.
How this code works:
5_day_cum_%
is calculated using a rolling 5-day window, with the product of the values in this window.RTH-SPY
is column RTH
"minus" column SPY
.RTH-SPY
, then using .shift()
for forward rolling (it is not possible to use .rolling()
to roll forwards. This idea came from Daniel Manso here. Finally, .where()
is used to only keep these values on the condition that [5_day_cum_%] > 0.1
(or 10%), returning np.nan
otherwise.Additions from comments
From your additions in the comments, here are two options for each of those (one using pd.where
again, the other just using standard pandas filtering (I'm not sure if it has an actual name). In both, the standard filtering is shorter.
A list of all the dates:
# Option 1: pd.where
list(dfcomp_daily_returns.where(dfcomp_daily_returns["5_day_cum_%"] > 0.1, np.nan).dropna(subset=["5_day_cum_%"]).index)
# Option 2: standard pandas filtering
list(dfcomp_daily_returns[dfcomp_daily_returns["5_day_cum_%"] > 0.1].index)
A dataframe of only those with 5-day return greater than 10%:
# Option 1: pd.where
dfcomp_daily_returns.where(dfcomp_daily_returns["5_day_cum_%"] > 0.1, np.nan).dropna(subset=["5_day_cum_%"])[["5_day_cum_%", "output_10", "output_30"]]
# Option 2: standard pandas row filtering
dfcomp_daily_returns[dfcomp_daily_returns["5_day_cum_%"] > 0.1][["5_day_cum_%", "output_10", "output_30"]]
Upvotes: 1