Reputation: 4539
I have a series with True
and False
and need to find all groups of True
.
This means that I need to find the start index and end index of neighboring True
values.
The following code gives the intended result but is very slow, inefficient and clumsy.
import pandas as pd
def groups(ser):
g = []
flag = False
start = None
for idx, s in ser.items():
if flag and not s:
g.append((start, idx-1))
flag = False
elif not flag and s:
start = idx
flag = True
if flag:
g.append((start, idx))
return g
if __name__ == "__main__":
ser = pd.Series([1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1], dtype=bool)
print(ser)
g = groups(ser)
print("\ngroups of True:")
for start, end in g:
print("from {} until {}".format(start, end))
pass
output is:
0 True
1 True
2 False
3 False
4 True
5 False
6 False
7 True
8 True
9 True
10 True
11 False
12 True
13 False
14 True
groups of True:
from 0 until 1
from 4 until 4
from 7 until 10
from 12 until 12
from 14 until 14
There are similar questions out there but non is looking to find the indices of the group starts/ends.
Upvotes: 4
Views: 475
Reputation: 34086
You can use itertools
:
In [478]: from operator import itemgetter
...: from itertools import groupby
In [489]: a = ser[ser].index.tolist() # Create a list of indexes having `True` in `ser`
In [498]: for k, g in groupby(enumerate(a), lambda ix : ix[0] - ix[1]):
...: l = list(map(itemgetter(1), g))
...: print(f'from {l[0]} to {l[-1]}')
...:
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
Upvotes: 2
Reputation: 150785
It's common to use cumsum
on the negation to check for consecutive blocks. For example:
for _,x in s[s].groupby((1-s).cumsum()):
print(f'from {x.index[0]} to {x.index[-1]}')
Output:
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
Upvotes: 3