Retrieve number of rows between the current row and the last/next positive value

Question

I have the following dataframe:

I would like to create a 2 columns that will include the number of rows between the current row and the last and next positive value. The output dataframe should be something like this:

   feature    previous_feat        next_feat
0        1               NA                5
1        0                1                4
2        0                2                3
3        0                3                2
4        0                4                1
5        1                5                2
6        0                1                1
7        1                2               NA

I already tried things around the combination of shift and mask methods but I don't manage to make it works. Note that it could be the number of row or the index difference it doesn't really matter for me. Same thing for the NA values, it could be NA or 0.

import pandas as pd

df = pd.DataFrame({"feature": [1, 0, 0, 0, 0, 1, 0, 1]})

# df["previous_feat"] = df.shift().mask(df["feature"] != 0)

mozway · Accepted Answer

You can use groupby.cumcount and boolean masks:

# form groups
g1 = df.loc[::-1, 'feature'].eq(1).cumsum()
g2 = df['feature'].eq(1).cumsum()

# mask first/last
m1 = g2.eq(1) & df['feature'].eq(1)
m2 = g1.eq(1) & df['feature'].eq(1)

# compute cumcount
df['previous_feat'] = df.groupby(g1).cumcount().add(1).mask(m1)
df['next_feat'] = df[::-1].groupby(g2).cumcount().add(1).mask(m2)

Output:

   feature  previous_feat  next_feat
0        1            NaN        5.0
1        0            1.0        4.0
2        0            2.0        3.0
3        0            3.0        2.0
4        0            4.0        1.0
5        1            5.0        2.0
6        0            1.0        1.0
7        1            2.0        NaN

Retrieve number of rows between the current row and the last/next positive value

Answers (2)

Related Questions