Reputation: 389
Transitioning from R to Python, and I am having a difficult time replicating the following code:
df = df %>% group_by(ID) %>% slice(seq_len(min(which(F < 1 & d == 8), n()))
Sample Data:
ID Price F D
1 10.1 1 NAN
1 10.4 1 NAN
1 10.6 .8 8
1 8.1 .8 NAN
1 8.5 .8 NAN
2 22.4 2 NAN
2 22.1 2 NAN
2 21.1 .9 8
2 20.1 .9 NAN
2 20.1 .9 6
with the desired output:
ID Price F D
1 10.1 1 NAN
1 10.4 1 NAN
2 22.4 2 NAN
2 22.1 2 NAN
I believe the code in python would include some sort of: np.where, cumcount(), and slice.
However, I have no idea how I would go about doing this. Any help would be appreciated, thank you.
EDIT: To anyone in the future who comes to my question in hopes to finding a solution - yatu's solution worked fine - but I have worked my way into another solution which i found to be a bit more easier to read:
df['temp'] = np.where((df['F'] < 1) & (df['D'] == 8), 1, 0)
mask = df.groupby(ID)['temp'].cumsum().eq(0)
df[mask]
I've read up on masking a bit and it really does help simplify the complexities of python quite a bit!
Upvotes: 1
Views: 333
Reputation: 88276
You could index the dataframe using the conditions bellow:
c1 = ~df.Distro.eq(8).groupby(df.ID).cumsum()
c2 = df.Factor.lt(1).groupby(df.ID).cumsum().eq(0)
df[c1 & c2]
ID Price Factor Distro
0 1 10.1 1.0 NAN
1 1 10.4 1.0 NAN
5 2 22.4 2.0 NAN
6 2 22.1 2.0 NAN
Note that by taking the .cumsum
of a boolean series you are essentially propagating the True
values, so as soon as a True
appears the remaining values will be True
. This result, having been negated can be used to remove rows from the dataframe as soon as a value appears.
Details
The following dataframe shows the original dataframe along with the conditions used to index it. In this case given that the specified criteria takes place in the same rows, both conditions show the same behaviour:
df.assign(c1=c1, c2=c2)
ID Price Factor Distro c1 c2
0 1 10.1 1.0 NAN True True
1 1 10.4 1.0 NAN True True
2 1 10.6 0.8 8 False False
3 1 8.1 0.8 NAN False False
4 1 8.5 0.8 NAN False False
5 2 22.4 2.0 NAN True True
6 2 22.1 2.0 NAN True True
7 2 21.1 0.9 8 False False
8 2 20.1 0.9 NAN False False
9 2 20.1 0.9 6 False False
Upvotes: 1