Reputation: 33
I have a dataframe with prices and Entry Exit Signals and want to calculate pnl without for looping the dataframe:
If Signal == "Entry (short)" I will short A and buy B, vice versa for "Entry (long)". Signal == "Exit (short) closes out "Entry (short)" and vice versa for "Exit (long)" all in chronological order so corresponding Exits come after corresponding Entry.
e.g.
price_df[price_df["signal"].str.contains("E")] #filtered and show how entry exits are ordered
A B A/B Z score Signal
2021-04-19 4926.1 148.2251 2.187012 Entry (short)
2021-05-31 4357.7 159.8565 0.030637 Exit (short)
2022-01-06 4658.9 137.4595 2.139689 Entry (short)
2022-02-07 3624.1 123.2928 -0.018995 Exit (short)
Now, between each pair of entry and exit will be either -1 corresponding to Entry (short) or 1 corresponding to Entry (long).
e.g.
A B A/B Z score Signal
2021-04-19 4926.1 148.2251 2.187012 Entry (short)
2021-04-20 4726.9 145.8136 1.880473 -1
2021-04-21 4488.9 140.3698 1.715539 -1
...
2021-05-27 4333.4 157.0219 0.165915 -1
2021-05-28 4440.3 157.4703 0.417588 -1
2021-05-31 4357.7 159.8565 0.030637 Exit (short)
Goal I want to create a column p&L that calculates:
for Entry/Exit (short): (B.pct_change() - A.pct_change())/2
for Entry/Exit (long): (-B.pct_change() + A.pct_change())/2
So far I'm using a for loop to keep track of position etc but vectorization would be much faster...
Upvotes: 0
Views: 33
Reputation: 165
Assuming percent change is calculated between entry and corresponding exit, you can use the following vectorized approach
import pandas as pd
data = {
"date": ["2021-04-19", "2021-05-31", "2021-06-02", "2021-08-23", "2022-01-06", "2022-02-07", "2022-08-18", "2022-12-23"],
"A": [4926.1, 4357.7, 4658.9, 3624.1, 1232.3, 6343.3, 3214.5, 2122.2],
"B": [148.2251, 159.8565, 137.4595, 123.2928, 234.1234, 543.6543, 547.8976, 765.4908],
"Signal": ["Entry (short)", "Exit (short)", "Entry (long)", "Exit (long)", "Entry (short)", "Exit (short)", "Entry (long)", "Exit (long)"]
}
df = pd.DataFrame(data)
# sort by date
df = df.sort_values(by='date')
# keep entry/exit signal rows
df = df[df['Signal'].str.startswith('E')]
# calculate percent changes between entry and corresponding exit
df['A_pct_change'] = df['A'].pct_change()
df['B_pct_change'] = df['B'].pct_change()
# calculate PnL
df['PnL'] = np.where(
df['Signal'] == 'Exit (short)', (df['B_pct_change'] - df['A_pct_change'])/2,
np.where(
df['Signal'] == 'Exit (long)', (df['A_pct_change'] - df['B_pct_change'])/2, np.NAN
)
)
P.S. I have excluded rows with signal=-1 since they're not relevant for the calculation
Upvotes: 0