Reputation: 3275
I am running the following code to calculate for every dataframe row the number of positive days in the previous rows and the number of days in which the stock has beaten the S&P 500 index:
for offset in [1,5,15,30,45,60,75,90,120,150,
200,250,500,750,1000,1250,1500]:
asset['return_stock'] = (asset.Close - asset.Close.shift(1)) / (asset.Close.shift(1))
merged_data = pd.merge(asset, sp_500, on='Date')
total_positive_days=0
total_beating_sp_days=0
for index, row in merged_data.iterrows():
print(offset, index)
for i in range(0,offset):
if index-i-1>0:
if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']:
total_positive_days+=1
if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']:
total_beating_sp_days+=1
but it is quite slow. Is there a way to speed it up (possibly by somehow getting rid of the for loop)?
My dataset looks like this (merged_data follows):
Date Open_x High_x Low_x Close_x Adj Close_x Volume_x return_stock Pct_positive_1 Pct_beating_1 Pct_change_1 Pct_change_plus_1 Pct_positive_5 Pct_beating_5 Pct_change_5 Pct_change_plus_5 Pct_positive_15 Pct_beating_15 Pct_change_15 Pct_change_plus_15 Pct_positive_30 Pct_beating_30 Pct_change_30 Pct_change_plus_30 Open_y High_y Low_y Close_y Adj Close_y Volume_y return_sp
0 2010-01-04 30.490000 30.642857 30.340000 30.572857 26.601469 123432400 NaN 1311.0 1261.0 NaN -0.001726 1310.4 1260.8 NaN 0.018562 1307.2 1257.6 NaN 0.039186 1302.066667 1252.633333 NaN 0.056579 1116.560059 1133.869995 1116.560059 1132.989990 1132.989990 3991400000 0.016043
1 2010-01-05 30.657143 30.798571 30.464285 30.625713 26.647457 150476200 0.001729 1311.0 1261.0 0.001729 0.016163 1310.4 1260.8 NaN 0.032062 1307.2 1257.6 NaN 0.031268 1302.066667 1252.633333 NaN 0.056423 1132.660034 1136.630005 1129.660034 1136.520020 1136.520020 2491020000 0.003116
2 2010-01-06 30.625713 30.747143 30.107143 30.138571 26.223597 138040000 -0.015906 1311.0 1261.0 -0.015906 0.001852 1310.4 1260.8 NaN 0.001519 1307.2 1257.6 NaN 0.058608 1302.066667 1252.633333 NaN 0.046115 1135.709961 1139.189941 1133.949951 1137.140015 1137.140015 4972660000 0.000546
3 2010-01-07 30.250000 30.285715 29.864286 30.082857 26.175119 119282800 -0.001849 1311.0 1261.0 -0.001849 -0.006604 1310.4 1260.8 NaN 0.005491 1307.2 1257.6 NaN 0.096428 1302.066667 1252.633333 NaN 0.050694 1136.270020 1142.459961 1131.319946 1141.689941 1141.689941 5270680000 0.004001
4 2010-01-08 30.042856 30.285715 29.865715 30.282858 26.349140 111902700 0.006648 1311.0 1261.0 0.006648 0.008900 1310.4 1260.8 NaN 0.029379 1307.2 1257.6 NaN 0.088584 1302.066667 1252.633333 NaN 0.075713 1140.520020 1145.390015 1136.219971 1144.979980 1144.979980 4389590000 0.002882
asset follows:
Date Open High Low Close Adj Close Volume return_stock Pct_positive_1 Pct_beating_1 Pct_change_1 Pct_change_plus_1 Pct_positive_5 Pct_beating_5 Pct_change_5 Pct_change_plus_5
0 2010-01-04 30.490000 30.642857 30.340000 30.572857 26.601469 123432400 NaN 1311.0 1261.0 NaN -0.001726 1310.4 1260.8 NaN 0.018562
1 2010-01-05 30.657143 30.798571 30.464285 30.625713 26.647457 150476200 0.001729 1311.0 1261.0 0.001729 0.016163 1310.4 1260.8 NaN 0.032062
2 2010-01-06 30.625713 30.747143 30.107143 30.138571 26.223597 138040000 -0.015906 1311.0 1261.0 -0.015906 0.001852 1310.4 1260.8 NaN 0.001519
3 2010-01-07 30.250000 30.285715 29.864286 30.082857 26.175119 119282800 -0.001849 1311.0 1261.0 -0.001849 -0.006604 1310.4 1260.8 NaN 0.005491
4 2010-01-08 30.042856 30.285715 29.865715 30.282858 26.349140 111902700 0.006648 1311.0 1261.0 0.006648 0.008900 1310.4 1260.8 NaN 0.029379
sp_500 follows:
Date Open High Low Close Adj Close Volume return_sp
0 1999-12-31 1464.469971 1472.420044 1458.189941 1469.250000 1469.250000 374050000 NaN
1 2000-01-03 1469.250000 1478.000000 1438.359985 1455.219971 1455.219971 931800000 -0.009549
2 2000-01-04 1455.219971 1455.219971 1397.430054 1399.420044 1399.420044 1009000000 -0.038345
3 2000-01-05 1399.420044 1413.270020 1377.680054 1402.109985 1402.109985 1085500000 0.001922
4 2000-01-06 1402.109985 1411.900024 1392.099976 1403.449951 1403.449951 1092300000 0.000956
Upvotes: 1
Views: 54
Reputation: 322
This is a partial answer.
I think the way you do
asset.Close - asset.Close.shift(1)
at the top is key to how you might do this. Instead of
if merged_data.loc[index-i,'Close_x'] > merged_data.loc[index-i-1,'Close_x']
create a column with the change in Close_x:
merged_data['Delta_Close_x'] = merged_data.Close_x - merged_data.Close_x.shift(1)
Similarly,
if merged_data.loc[index-i,'return_stock'] > merged_data.loc[index-i-1,'return_sp']
becomes
merged_data['vs_sp'] = merged_data.return_stock - merged_data.return_sp.shift(1)
Then you can iterate i and use subsets like
merged_data[merged_data['Delta_Close_x'] > 0 and merged_data['vs_sp'] > 0]
There are a lot of additional details to work out, but I hope this gets you started.
Upvotes: 1