Reputation: 233
I have a dataframe along the lines of the below, for analysing stock data:
timestamp Price Exit Price
1 2019-09-29 15:33:00 14
2 2019-09-29 15:34:00 15
3 2019-09-29 15:35:00 14
4 2019-09-29 15:36:00 17
5 2019-09-29 15:37:00 20
I'm trying to backtest a strategy, and so I want to populate the Exit price column with a subsequent value of the price column, when the first row with any of the following conditions is met:
So for example, if the number of minutes is 2 and the return is 10% , the table should populated as follows:
timestamp Price Exit Price
1 2019-09-29 15:33:00 14 14<-- From Row 3 because 2 minutes passed
2 2019-09-29 15:34:00 15 17<-- From Row 4, both conditions satisfied
3 2019-09-29 15:35:00 14 17<-- From Row 4, difference greater than 10%
4 2019-09-29 15:36:00 17 20
5 2019-09-29 15:37:00 20 Nan
I thought about implementing a solution similar to this:
customFilter(row):
results = df[
(df['timestamp'] > row['timestamp']) &
(
(df['timestamp'] <= (row['timestamp']+pd.timedelta('2m')) |
(df['price'] > row['price']*1.1)
)
]
if results.shape[0] > 0:
return results['price'].first()
return nan
df['Exit Price'] = df.apply(lambda x: customFilter(x), axis = 1)
Question is, is there a better way to do this? It doesn't seem like the most efficient or fastest way, especially if I grow the size of my data set.
Upvotes: 2
Views: 202
Reputation: 8033
IICU, this is waht you need.
from datetime import timedelta
df['timestamp'] = pd.to_datetime(df['timestamp'])
E_Price=[]
time_diff = df['timestamp'].apply(lambda x: x >= (df['timestamp']+timedelta(minutes=2)))
price_diff = df['Price'].apply(lambda x: x >= (df['Price']*1.1))
for i in range(len(df)):
check = (time_diff|price_diff)[i]
ind = check.idxmax()
if ind != 0:
val = df.iloc[ind,1]
else:
val = np.nan
E_Price.append(val)
df['Exit_Price'] = E_Price
df['Exit_Price'] = df.Exit_Price.astype(pd.Int32Dtype())
print(df)
Output
timestamp Price Exit_Price
0 2019-09-29 15:33:00 14 14
1 2019-09-29 15:34:00 15 17
2 2019-09-29 15:35:00 14 17
3 2019-09-29 15:36:00 17 20
4 2019-09-29 15:37:00 20 NaN
Upvotes: 1
Reputation: 5451
This is how you can do it using numpy's broadcast feature
df = pd.DataFrame([('2019-09-29 15:33:00', '14'), ('2019-09-29 15:34:00', '15'), ('2019-09-29 15:35:00', '14'), ('2019-09-29 15:36:00', '17'), ('2019-09-29 15:37:00', '20')], columns=('timestamp', 'Price'))
df.timestamp = pd.to_datetime(df.timestamp)
df.Price = df.Price.astype(np.int)
price = df.Price.values
timestamp = df["timestamp"].values
## compare each pair of row with each other for the condition
## np.newaxis is used so that rows are repeated in column direction creating a matrix of len(rows)*len(rows)
cond = ((timestamp+pd.Timedelta("2m"))<=timestamp[:, np.newaxis]) | \
((price[:, np.newaxis] - price)/price >=0.1)
## argmax will give 1st index matching the cond
df.loc[cond.argmax(axis=0)>0, "Extra Price"] = price[cond.argmax(axis=0)[cond.argmax(axis=0)>0]]
display(df)
Result
timestamp Price Extra Price
0 2019-09-29 15:33:00 14 14.0
1 2019-09-29 15:34:00 15 17.0
2 2019-09-29 15:35:00 14 17.0
3 2019-09-29 15:36:00 17 20.0
4 2019-09-29 15:37:00 20 NaN
Upvotes: 1