Python's equivalent of R's seq_len, slice, and where?

Question

Transitioning from R to Python, and I am having a difficult time replicating the following code:

df = df %>% group_by(ID) %>% slice(seq_len(min(which(F < 1 & d == 8), n()))

Sample Data:

ID     Price        F         D
 1      10.1       1          NAN
 1      10.4       1          NAN 
 1      10.6      .8           8
 1      8.1       .8          NAN
 1      8.5       .8          NAN 
 2      22.4       2          NAN
 2      22.1       2          NAN
 2      21.1      .9           8
 2      20.1      .9          NAN
 2      20.1      .9           6

with the desired output:

ID     Price       F           D
 1      10.1       1          NAN
 1      10.4       1          NAN 
 2      22.4       2          NAN
 2      22.1       2          NAN

I believe the code in python would include some sort of: np.where, cumcount(), and slice.

However, I have no idea how I would go about doing this. Any help would be appreciated, thank you.

EDIT: To anyone in the future who comes to my question in hopes to finding a solution - yatu's solution worked fine - but I have worked my way into another solution which i found to be a bit more easier to read:

df['temp'] = np.where((df['F'] < 1) & (df['D'] == 8), 1, 0)
mask = df.groupby(ID)['temp'].cumsum().eq(0)
df[mask]

I've read up on masking a bit and it really does help simplify the complexities of python quite a bit!

yatu · Accepted Answer

You could index the dataframe using the conditions bellow:

c1 = ~df.Distro.eq(8).groupby(df.ID).cumsum()
c2 = df.Factor.lt(1).groupby(df.ID).cumsum().eq(0)
df[c1 & c2]

   ID  Price  Factor  Distro
0   1   10.1     1.0    NAN
1   1   10.4     1.0    NAN
5   2   22.4     2.0    NAN
6   2   22.1     2.0    NAN

Note that by taking the .cumsum of a boolean series you are essentially propagating the True values, so as soon as a True appears the remaining values will be True. This result, having been negated can be used to remove rows from the dataframe as soon as a value appears.

Details

The following dataframe shows the original dataframe along with the conditions used to index it. In this case given that the specified criteria takes place in the same rows, both conditions show the same behaviour:

df.assign(c1=c1, c2=c2)

   ID  Price  Factor Distro     c1     c2
0   1   10.1     1.0    NAN   True   True
1   1   10.4     1.0    NAN   True   True
2   1   10.6     0.8      8  False  False
3   1    8.1     0.8    NAN  False  False
4   1    8.5     0.8    NAN  False  False
5   2   22.4     2.0    NAN   True   True
6   2   22.1     2.0    NAN   True   True
7   2   21.1     0.9      8  False  False
8   2   20.1     0.9    NAN  False  False
9   2   20.1     0.9      6  False  False

Python's equivalent of R's seq_len, slice, and where?

Answers (1)

Related Questions

Python&#39;s equivalent of R&#39;s seq_len, slice, and where?

Answers (1)

Related Questions

Python's equivalent of R's seq_len, slice, and where?