Reputation:
In a Python DataFrame, I want to detect the beginning and end position of a block of False
values in a row. If the block contains just one False
, I would like to get that position.
Example:
df = pd.DataFrame({"a": [True, True, True,False,False,False,True,False,True],})
In[110]: df
Out[111]:
a
0 True
1 True
2 True
3 False
4 False
5 False
6 True
7 False
8 True
In this example, I would like to get the positions
`3`, `5`
and
`7`, `7`.
Upvotes: 2
Views: 201
Reputation: 863291
Use:
a = (df.a.cumsum()[~df.a]
.reset_index()
.groupby('a')['index']
.agg(['first','last'])
.values
.tolist())
print(a)
[[3, 5], [7, 7]]
Explanation:
First get cumulative sum by cumsum
- get for all False
unique groups:
print (df.a.cumsum())
0 1
1 2
2 3
3 3
4 3
5 3
6 4
7 4
8 5
Name: a, dtype: int32
Filter only False
rows by boolean indexing
with invert boolean column:
print (df.a.cumsum()[~df.a])
3 3
4 3
5 3
7 4
Name: a, dtype: int32
Create column from index by reset_index
:
print (df.a.cumsum()[~df.a].reset_index())
index a
0 3 3
1 4 3
2 5 3
3 7 4
For each group aggregate by agg
functions first
and last
:
print (df.a.cumsum()[~df.a].reset_index().groupby('a')['index'].agg(['first','last']))
first last
a
3 3 5
4 7 7
Last convert to nested list
:
print (df.a.cumsum()[~df.a].reset_index().groupby('a')['index'].agg(['first','last']).values.tolist())
[[3, 5], [7, 7]]
Upvotes: 2