Python Pandas - Time Series Find Index of Previous Row

Question

I am analyzing time series data of one stock to seek the highest price for further analysis, here is the sample dataframe df:

   date      close   high_3days
2021-05-01    20        20
2021-05-02    23        23
2021-05-03    26        26
2021-05-04    24        26
2021-05-05    20        26
2021-05-06    26        26
2021-05-07    22        26
2021-05-08    30        30
2021-05-09    20        30
2021-05-10    20        30

I want to add a new column to find the number of days from previous 3 days high. My logic is seeking the index of the row of previous high, and then subtract it from the index of current row.

Here is the desire output:

   date      close   high_3days  days_previous_high
2021-05-01    20        20               0
2021-05-02    23        23               0
2021-05-03    26        26               0
2021-05-04    24        26               1
2021-05-05    20        26               2
2021-05-06    22        26               3
2021-05-07    20        26               4
2021-05-08    30        30               0
2021-05-09    20        30               1
2021-05-10    20        30               2

Could you help to figure the way out~? Thanks guys!

Henry Ecker · Accepted Answer

Try creating a boolean index with expanding max, then enumerate each group with groupby cumcount:

df['days_previous_high'] = df.groupby(
    df['high_3days'].expanding().max().diff().gt(0).cumsum()).cumcount()

df:

         date  close  high_3days  days_previous_high
0  2021-05-01     20          20                   0
1  2021-05-02     23          23                   0
2  2021-05-03     26          26                   0
3  2021-05-04     24          26                   1
4  2021-05-05     20          26                   2
5  2021-05-06     22          26                   3
6  2021-05-07     20          26                   4
7  2021-05-08     30          30                   0
8  2021-05-09     20          30                   1
9  2021-05-10     20          30                   2

Explaination:

expanding max is used to determine the current maximum value at each row.

df['high_3days'].expanding().max()

diff can be used to see where the current value exceeds the max.

df['high_3days'].expanding().max().diff()

groups can be created by taking the cumsum of where the diff is greater than 0:

df['high_3days'].expanding().max().diff().gt(0).cumsum()

expanding_max  expanding_max_diff  expanding_max_gt_0  expanding_max_gt_0_cs
         20.0                 NaN               False                      0
         23.0                 3.0                True                      1
         26.0                 3.0                True                      2
         26.0                 0.0               False                      2
         26.0                 0.0               False                      2
         26.0                 0.0               False                      2
         26.0                 0.0               False                      2
         30.0                 4.0                True                      3
         30.0                 0.0               False                      3
         30.0                 0.0               False                      3

Now that rows are grouped, groupby cumcount can be used to enumerate each group:

df.groupby(df['high_3days'].expanding().max().diff().gt(0).cumsum()).cumcount()

0    0
1    0
2    0
3    1
4    2
5    3
6    4
7    0
8    1
9    2
dtype: int64

Python Pandas - Time Series Find Index of Previous Row

Answers (1)

Related Questions