yungpadewon
yungpadewon

Reputation: 389

Python's equivalent of R's seq_len, slice, and where?

Transitioning from R to Python, and I am having a difficult time replicating the following code:

df = df %>% group_by(ID) %>% slice(seq_len(min(which(F < 1 & d == 8), n()))

Sample Data:

ID     Price        F         D
 1      10.1       1          NAN
 1      10.4       1          NAN 
 1      10.6      .8           8
 1      8.1       .8          NAN
 1      8.5       .8          NAN 
 2      22.4       2          NAN
 2      22.1       2          NAN
 2      21.1      .9           8
 2      20.1      .9          NAN
 2      20.1      .9           6

with the desired output:

ID     Price       F           D
 1      10.1       1          NAN
 1      10.4       1          NAN 
 2      22.4       2          NAN
 2      22.1       2          NAN

I believe the code in python would include some sort of: np.where, cumcount(), and slice.

However, I have no idea how I would go about doing this. Any help would be appreciated, thank you.

EDIT: To anyone in the future who comes to my question in hopes to finding a solution - yatu's solution worked fine - but I have worked my way into another solution which i found to be a bit more easier to read:

df['temp'] = np.where((df['F'] < 1) & (df['D'] == 8), 1, 0)
mask = df.groupby(ID)['temp'].cumsum().eq(0)
df[mask]

I've read up on masking a bit and it really does help simplify the complexities of python quite a bit!

Upvotes: 1

Views: 333

Answers (1)

yatu
yatu

Reputation: 88276

You could index the dataframe using the conditions bellow:

c1 = ~df.Distro.eq(8).groupby(df.ID).cumsum()
c2 = df.Factor.lt(1).groupby(df.ID).cumsum().eq(0)
df[c1 & c2]

   ID  Price  Factor  Distro
0   1   10.1     1.0    NAN
1   1   10.4     1.0    NAN
5   2   22.4     2.0    NAN
6   2   22.1     2.0    NAN

Note that by taking the .cumsum of a boolean series you are essentially propagating the True values, so as soon as a True appears the remaining values will be True. This result, having been negated can be used to remove rows from the dataframe as soon as a value appears.


Details

The following dataframe shows the original dataframe along with the conditions used to index it. In this case given that the specified criteria takes place in the same rows, both conditions show the same behaviour:

df.assign(c1=c1, c2=c2)

   ID  Price  Factor Distro     c1     c2
0   1   10.1     1.0    NAN   True   True
1   1   10.4     1.0    NAN   True   True
2   1   10.6     0.8      8  False  False
3   1    8.1     0.8    NAN  False  False
4   1    8.5     0.8    NAN  False  False
5   2   22.4     2.0    NAN   True   True
6   2   22.1     2.0    NAN   True   True
7   2   21.1     0.9      8  False  False
8   2   20.1     0.9    NAN  False  False
9   2   20.1     0.9      6  False  False

Upvotes: 1

Related Questions