Martijncl
Martijncl

Reputation: 105

How to get a subset of rows from a group in a pandas dataframe?

I have a dataframe with a column ID and a binary column, like the example below

     ID    BINARY_MASK
0   101        1
1   101        0
2   101        1
3   101        1
4   101        1
5   101        1
6   101        0
7   101        1
8   102        1 
9   102        1
11  102        1
12  102        1
13  102        0 
14  102        0

What I want to do is get the first four consecutive entries that are 1, per ID group. The result I would like to see is the following:

     ID    BINARY_MASK
2   101        1
3   101        1
4   101        1
5   101        1
8   102        1 
9   102        1
11  102        1
12  102        1

The index inside the group where there are four consecutive ones differs per group, like in the example. How do I do this?

I have tried the solution that was offered by Bill G in this question, but that didn't work for me.

Working with Pandas dataframes and Python 3.6

Upvotes: 3

Views: 2307

Answers (3)

jezrael
jezrael

Reputation: 863791

Create helper Series for GroupBy.transform with cumsum of shifted values compared by ne (!=) and chain with another condition, last filter by boolean indexing:

s = df['BINARY_MASK'].ne(df['BINARY_MASK'].shift()).cumsum()
m1 = df.groupby(s)['BINARY_MASK'].transform('size') >= 4
m2 = df['BINARY_MASK'] == 1

df = df[m1 & m2]
print (df)
     ID  BINARY_MASK
2   101            1
3   101            1
4   101            1
5   101            1
7   101            1
8   102            1
9   102            1
11  102            1
12  102            1

Upvotes: 2

piRSquared
piRSquared

Reputation: 294576

query and groupby with head

Easiest thing to do is to filter which are ones prior to grouping. You can do the filtering in several ways, I chose to use query.

df.query('BINARY_MASK == 1').groupby('ID').head(4)

     ID  BINARY_MASK
0   101            1
2   101            1
3   101            1
4   101            1
8   102            1
9   102            1
11  102            1
12  102            1

Upvotes: 3

Space Impact
Space Impact

Reputation: 13255

Use groupby + head :

df[df['BINARY_MASK']==1].groupby('ID').head(4)

     ID  BINARY_MASK
0   101            1
2   101            1
3   101            1
4   101            1
8   102            1
9   102            1
11  102            1
12  102            1

Upvotes: 1

Related Questions