user8042669
user8042669

Reputation:

How to Detect a Streak of Certain Values in a DataFrame?

In a Python DataFrame, I want to detect the beginning and end position of a block of False values in a row. If the block contains just one False, I would like to get that position.

Example:

df = pd.DataFrame({"a": [True, True, True,False,False,False,True,False,True],})
In[110]: df
Out[111]: 
       a
0   True
1   True
2   True
3  False
4  False
5  False
6   True
7  False
8   True

In this example, I would like to get the positions

`3`, `5`

and

`7`, `7`.

Upvotes: 2

Views: 201

Answers (1)

jezrael
jezrael

Reputation: 863291

Use:

a = (df.a.cumsum()[~df.a]
         .reset_index()
         .groupby('a')['index']
         .agg(['first','last'])
         .values
         .tolist())
print(a)
[[3, 5], [7, 7]]

Explanation:

First get cumulative sum by cumsum - get for all False unique groups:

print (df.a.cumsum())
0    1
1    2
2    3
3    3
4    3
5    3
6    4
7    4
8    5
Name: a, dtype: int32

Filter only False rows by boolean indexing with invert boolean column:

print (df.a.cumsum()[~df.a])
3    3
4    3
5    3
7    4
Name: a, dtype: int32

Create column from index by reset_index:

print (df.a.cumsum()[~df.a].reset_index())
   index  a
0      3  3
1      4  3
2      5  3
3      7  4

For each group aggregate by agg functions first and last:

print (df.a.cumsum()[~df.a].reset_index().groupby('a')['index'].agg(['first','last']))
   first  last
a             
3      3     5
4      7     7

Last convert to nested list:

print (df.a.cumsum()[~df.a].reset_index().groupby('a')['index'].agg(['first','last']).values.tolist())
[[3, 5], [7, 7]]

Upvotes: 2

Related Questions