Anonymous
Anonymous

Reputation: 33

Calculate PnL with vectorization

I have a dataframe with prices and Entry Exit Signals and want to calculate pnl without for looping the dataframe:

If Signal == "Entry (short)" I will short A and buy B, vice versa for "Entry (long)". Signal == "Exit (short) closes out "Entry (short)" and vice versa for "Exit (long)" all in chronological order so corresponding Exits come after corresponding Entry.

e.g.

price_df[price_df["signal"].str.contains("E")] #filtered and show how entry exits are ordered

              A        B        A/B Z score    Signal
2021-04-19  4926.1  148.2251    2.187012    Entry (short)
2021-05-31  4357.7  159.8565    0.030637    Exit (short)
2022-01-06  4658.9  137.4595    2.139689    Entry (short)
2022-02-07  3624.1  123.2928    -0.018995   Exit (short)

Now, between each pair of entry and exit will be either -1 corresponding to Entry (short) or 1 corresponding to Entry (long).

e.g.

              A        B        A/B Z score    Signal
2021-04-19  4926.1  148.2251    2.187012    Entry (short)
2021-04-20  4726.9  145.8136    1.880473    -1
2021-04-21  4488.9  140.3698    1.715539    -1
...
2021-05-27  4333.4  157.0219    0.165915    -1
2021-05-28  4440.3  157.4703    0.417588    -1
2021-05-31  4357.7  159.8565    0.030637    Exit (short)

Goal I want to create a column p&L that calculates:

for Entry/Exit (short): (B.pct_change() - A.pct_change())/2

for Entry/Exit (long): (-B.pct_change() + A.pct_change())/2

So far I'm using a for loop to keep track of position etc but vectorization would be much faster...

Upvotes: 0

Views: 33

Answers (1)

sud
sud

Reputation: 165

Assuming percent change is calculated between entry and corresponding exit, you can use the following vectorized approach

import pandas as pd


data = {
    "date": ["2021-04-19", "2021-05-31", "2021-06-02", "2021-08-23", "2022-01-06", "2022-02-07", "2022-08-18", "2022-12-23"],
    "A": [4926.1, 4357.7, 4658.9, 3624.1, 1232.3, 6343.3, 3214.5, 2122.2],
    "B": [148.2251, 159.8565, 137.4595, 123.2928, 234.1234, 543.6543, 547.8976, 765.4908],
    "Signal": ["Entry (short)", "Exit (short)", "Entry (long)", "Exit (long)", "Entry (short)", "Exit (short)", "Entry (long)", "Exit (long)"]
}

df = pd.DataFrame(data)

# sort by date
df = df.sort_values(by='date')

# keep entry/exit signal rows
df = df[df['Signal'].str.startswith('E')]

# calculate percent changes between entry and corresponding exit
df['A_pct_change'] = df['A'].pct_change()
df['B_pct_change'] = df['B'].pct_change()


# calculate PnL
df['PnL'] = np.where(
    df['Signal'] == 'Exit (short)', (df['B_pct_change'] - df['A_pct_change'])/2,
    np.where(
        df['Signal'] == 'Exit (long)', (df['A_pct_change'] - df['B_pct_change'])/2, np.NAN
    )
)

P.S. I have excluded rows with signal=-1 since they're not relevant for the calculation

Upvotes: 0

Related Questions