Jack Moody
Jack Moody

Reputation: 1771

More efficient way to change long/short position in pandas

Is there a more efficient way to change long/short positions in pandas than what I have come up with below?

The gist of the logic:

My Code

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'buy_sell': [np.nan, 1, np.nan, 1, np.nan, np.nan, 1, -1, np.nan, -1, np.nan, 1, np.nan, -1],
    'position': np.nan
})

for i, r, in df.iterrows():
    buy_sell = r['buy_sell']

    # Check if first index
    if i != 0:
        last_position = df.loc[i-1, 'position']

        if np.isnan(buy_sell):
            df.loc[i, 'position'] = last_position
        else:
            if np.isnan(last_position) or last_position * buy_sell <= 0:
                df.loc[i, 'position'] = buy_sell
            else:
                df.loc[i, 'position'] = last_position + buy_sell
    else:
        df.loc[i, 'position'] = buy_sell

Expected solution

df_expected = pd.DataFrame({
    'buy_sell': [np.nan, 1, np.nan, 1, np.nan, np.nan, 1, -1, np.nan, -1, np.nan, 1, np.nan, -1],
    'position': [np.nan, 1, 1, 2, 2, 2, 3, -1, -1, -2, -2, 1, 1, -1],
})
    buy_sell  position
0        NaN       NaN
1        1.0       1.0
2        NaN       1.0
3        1.0       2.0
4        NaN       2.0
5        NaN       2.0
6        1.0       3.0
7       -1.0      -1.0
8        NaN      -1.0
9       -1.0      -2.0
10       NaN      -2.0
11       1.0       1.0
12       NaN       1.0
13      -1.0      -1.0

Note: The code I have above produces the expected data frame. I am asking if there is a more efficient/better way of doing what I have done above.

Upvotes: 4

Views: 894

Answers (1)

Pablo C
Pablo C

Reputation: 4771

You can do something like this:

df["position"] = df.buy_sell.groupby(\
                      df.buy_sell.bfill()\
                      .diff()\
                      .abs()\
                      .cumsum()\
                 ).cumsum().ffill()
                 

Output

df
     buy_sell   position
0         NaN        NaN
1      1.0000     1.0000
2         NaN     1.0000
3      1.0000     2.0000
4         NaN     2.0000
5         NaN     2.0000
6      1.0000     3.0000
7     -1.0000    -1.0000
8         NaN    -1.0000
9     -1.0000    -2.0000
10        NaN    -2.0000
11     1.0000     1.0000
12        NaN     1.0000
13    -1.0000    -1.0000

Explanation

Basically, you want some sort of cumsum by groups created by the sign changes in the column, then, first of all we need to find those changes:

>>> df.buy_sell.bfill().diff().abs()
0           NaN
1        0.0000
2        0.0000
3        0.0000
4        0.0000
5        0.0000
6        0.0000
7        2.0000
8        0.0000
9        0.0000
10       2.0000
11       0.0000
12       2.0000
13       0.0000

As you can see, every 2 is a sign change. Now, usign pandas.Series.cumsum we can assign a group to every row

>>> df.buy_sell.bfill().diff().abs().cumsum()
0           NaN
1        0.0000
2        0.0000
3        0.0000
4        0.0000
5        0.0000
6        0.0000
7        2.0000
8        2.0000
9        2.0000
10       4.0000
11       4.0000
12       6.0000
13       6.0000

So, every row with 0 is a group, every row with 2 is a group, and so on. With this, we can directly use pandas.Series.groupby with pandas.core.groupby.GroupBy.cumsum:

>>> df.buy_sell.groupby(df.buy_sell.bfill().diff().abs().cumsum()).cumsum()
0           NaN
1        1.0000
2           NaN
3        2.0000
4           NaN
5           NaN
6        3.0000
7       -1.0000
8           NaN
9       -2.0000
10          NaN
11       1.0000
12          NaN
13      -1.0000

All we need now is to propagate last valid observation forward to next valid with pandas.Series.ffill:

>>> df.buy_sell.groupby(df.buy_sell.bfill().diff().abs().cumsum()).cumsum().ffill()
0           NaN
1        1.0000
2        1.0000
3        2.0000
4        2.0000
5        2.0000
6        3.0000
7       -1.0000
8       -1.0000
9       -2.0000
10      -2.0000
11       1.0000
12       1.0000
13      -1.0000

Upvotes: 5

Related Questions