imM4TT
imM4TT

Reputation: 282

Retrieve number of rows between the current row and the last/next positive value

I have the following dataframe:

   feature
0        1
1        0
2        0
3        0
4        0
5        1
6        0
7        1

I would like to create a 2 columns that will include the number of rows between the current row and the last and next positive value. The output dataframe should be something like this:

   feature    previous_feat        next_feat
0        1               NA                5
1        0                1                4
2        0                2                3
3        0                3                2
4        0                4                1
5        1                5                2
6        0                1                1
7        1                2               NA

I already tried things around the combination of shift and mask methods but I don't manage to make it works. Note that it could be the number of row or the index difference it doesn't really matter for me. Same thing for the NA values, it could be NA or 0.

import pandas as pd

df = pd.DataFrame({"feature": [1, 0, 0, 0, 0, 1, 0, 1]})

# df["previous_feat"] = df.shift().mask(df["feature"] != 0)

Upvotes: 2

Views: 53

Answers (2)

Chrysophylaxs
Chrysophylaxs

Reputation: 6583

groups = df["feature"].shift().cumsum()
df["previous_feat"] = df["feature"].groupby(groups).cumcount().add(1)
df["next_feat"] = df["feature"].groupby(groups).cumcount(ascending=False).add(1).shift(-1)
   feature  previous_feat  next_feat
0        1            NaN        5.0
1        0            1.0        4.0
2        0            2.0        3.0
3        0            3.0        2.0
4        0            4.0        1.0
5        1            5.0        2.0
6        0            1.0        1.0
7        1            2.0        NaN

Upvotes: 2

mozway
mozway

Reputation: 260580

You can use groupby.cumcount and boolean masks:

# form groups
g1 = df.loc[::-1, 'feature'].eq(1).cumsum()
g2 = df['feature'].eq(1).cumsum()

# mask first/last
m1 = g2.eq(1) & df['feature'].eq(1)
m2 = g1.eq(1) & df['feature'].eq(1)

# compute cumcount
df['previous_feat'] = df.groupby(g1).cumcount().add(1).mask(m1)
df['next_feat'] = df[::-1].groupby(g2).cumcount().add(1).mask(m2)

Output:

   feature  previous_feat  next_feat
0        1            NaN        5.0
1        0            1.0        4.0
2        0            2.0        3.0
3        0            3.0        2.0
4        0            4.0        1.0
5        1            5.0        2.0
6        0            1.0        1.0
7        1            2.0        NaN

Upvotes: 4

Related Questions