Reputation: 185
I currently have two data frames and wish to create a third data frame based upon two conditions both being satisfied: TRUE if the corresponding value in data frame 1 has just broken through 2 AND the value of data frame 2 is <= 0.2, else FALSE.
df1
df2
And the code should create df3 as follows:
i.e condition is only TRUE for 5y on 13-aug-17 as 5y has broken above 2 AND the value in df2 <=0.2.
The purpose of this is to replicate what has been done here mean reversion strategy over an entire data frame, rather than just for a single time-series, i.e for each of the steps in the example do the process at the data frame, rather than time series level.
So the question is how to apply the shift function below at the dataframe rather than column level.
df4['short entry'] = ((df4.zScore < - entryZscore) & ( df4.zScore.shift(1) > - entryZscore)&(df4['hurst'] < hurstentry))
Upvotes: 1
Views: 99
Reputation: 331
Short answer:
df3 = ((df1 > -entryZscore) & (df1.shift(1) < -entryZscore) & (df2 < hurstentry))
I am not sure to understand the mean reversion strategy problem, but if entryZscore and hurstentry are the same for all columns, you can try this:
import pandas as pd
date_index = pd.date_range('2017-08-10', '2017-08-13')
cols = ['2y','5y','10y']
df1 = pd.DataFrame([[1.3,1.3,1.3],[1.4,1.4,1.4],[1.9,2.1,1.9],[1.9,1.9,1.9]],
columns=cols, index=date_index)
df2 = pd.DataFrame([[0.3,0.3,0.3],[0.1,0.1,0.1],[0.1,0.1,0.1], [0.3,0.3,0.3]],
columns=cols, index=date_index)
entryZscore = -2
hurstentry = 0.2
df3 = ((df1 > -entryZscore) & (df1.shift(1) < -entryZscore) & (df2 < hurstentry))
Output:
2y 5y 10y
2017-08-10 False False False
2017-08-11 False False False
2017-08-12 False True False
2017-08-13 False False False
Upvotes: 1
Reputation: 637
This might be something helpful
import pandas as pd
import numpy as np
df1 = pd.DataFrame([['11',1.3,1.3,1.3],['12',1.4,1.4,1.4],['13',1.9,2.1,1.9], ['14',1.9,1.9,1.9]])
df1.columns = ['date','2y','5y','10y']
df2 = pd.DataFrame([['11',0.3,0.3,0.3],['12',0.1,0.1,0.1],['13',0.1,0.1,0.1], ['14',0.3,0.3,0.3]])
df2.columns = ['date','2y','5y','10y']
df = pd.merge(df1, df2, on='date', suffixes=['_zscore','_hurst'])
entryZscore = -2
hurstentry = 0.2
for x in ['2y','5y','10y']:
df[x+'_short'] = ((df[x+'_zscore'] > -entryZscore) & ( df[x+'_zscore'].shift(1) < -entryZscore)&(df[x+'_hurst'] < hurstentry))
entries = ['date'] + [x+'_short' for x in ['2y','5y','10y']]
result = df[entries]
print result
output as
date 2y_short 5y_short 10y_short
0 11 False False False
1 12 False False False
2 13 False True False
3 14 False False False
I don't recommend approach above. It seems a bit messy. If possible, you could have three data frame. Each frame will hold data for single duration. i.e. 1 data frame for 5 year related data, 1 for 10 year and 1 for 2 year. You could create a function to parse single data frame. In this way, you could just copy the code from the mean reversion sample. Understanding the math is more crucial than handling data in complex ways. Hope it helps.
Upvotes: 0