Reputation: 105
I have a dataframe with a column ID and a binary column, like the example below
ID BINARY_MASK
0 101 1
1 101 0
2 101 1
3 101 1
4 101 1
5 101 1
6 101 0
7 101 1
8 102 1
9 102 1
11 102 1
12 102 1
13 102 0
14 102 0
What I want to do is get the first four consecutive entries that are 1, per ID group. The result I would like to see is the following:
ID BINARY_MASK
2 101 1
3 101 1
4 101 1
5 101 1
8 102 1
9 102 1
11 102 1
12 102 1
The index inside the group where there are four consecutive ones differs per group, like in the example. How do I do this?
I have tried the solution that was offered by Bill G in this question, but that didn't work for me.
Working with Pandas dataframes and Python 3.6
Upvotes: 3
Views: 2307
Reputation: 863791
Create helper Series for GroupBy.transform
with cumsum
of shift
ed values compared by ne
(!=
) and chain with another condition, last filter by boolean indexing
:
s = df['BINARY_MASK'].ne(df['BINARY_MASK'].shift()).cumsum()
m1 = df.groupby(s)['BINARY_MASK'].transform('size') >= 4
m2 = df['BINARY_MASK'] == 1
df = df[m1 & m2]
print (df)
ID BINARY_MASK
2 101 1
3 101 1
4 101 1
5 101 1
7 101 1
8 102 1
9 102 1
11 102 1
12 102 1
Upvotes: 2
Reputation: 294576
query
and groupby
with head
Easiest thing to do is to filter which are ones prior to grouping. You can do the filtering in several ways, I chose to use query
.
df.query('BINARY_MASK == 1').groupby('ID').head(4)
ID BINARY_MASK
0 101 1
2 101 1
3 101 1
4 101 1
8 102 1
9 102 1
11 102 1
12 102 1
Upvotes: 3
Reputation: 13255
Use groupby
+ head
:
df[df['BINARY_MASK']==1].groupby('ID').head(4)
ID BINARY_MASK
0 101 1
2 101 1
3 101 1
4 101 1
8 102 1
9 102 1
11 102 1
12 102 1
Upvotes: 1