user7431005
user7431005

Reputation: 4539

find groups of neighboring True in pandas series

I have a series with True and False and need to find all groups of True. This means that I need to find the start index and end index of neighboring Truevalues.

The following code gives the intended result but is very slow, inefficient and clumsy.

import pandas as pd

def groups(ser):
    g = []

    flag = False
    start = None
    for idx, s in ser.items():
        if flag and not s:
            g.append((start, idx-1))
            flag = False
        elif not flag and s:
            start = idx
            flag = True
    if flag:
        g.append((start, idx))
    return g

if __name__ == "__main__":
    ser = pd.Series([1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1], dtype=bool)
    print(ser)

    g = groups(ser)
    print("\ngroups of True:")
    for start, end in g:
        print("from {} until {}".format(start, end))
    pass

output is:

0      True
1      True
2     False
3     False
4      True
5     False
6     False
7      True
8      True
9      True
10     True
11    False
12     True
13    False
14     True

groups of True:
from 0 until 1
from 4 until 4
from 7 until 10
from 12 until 12
from 14 until 14

There are similar questions out there but non is looking to find the indices of the group starts/ends.

Upvotes: 4

Views: 475

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34086

You can use itertools:

In [478]: from operator import itemgetter
     ...: from itertools import groupby

In [489]: a = ser[ser].index.tolist() # Create a list of indexes having `True` in `ser` 

In [498]: for k, g in groupby(enumerate(a), lambda ix : ix[0] - ix[1]):
     ...:     l = list(map(itemgetter(1), g))
     ...:     print(f'from {l[0]} to {l[-1]}')
     ...: 
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150785

It's common to use cumsum on the negation to check for consecutive blocks. For example:

for _,x in s[s].groupby((1-s).cumsum()):
    print(f'from {x.index[0]} to {x.index[-1]}')

Output:

from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14

Upvotes: 3

Related Questions