Reputation: 121

Finding consecutive duplicates and listing their indexes of where they occur in python

I have a list in python for example:

mylist = [1,1,1,1,1,1,1,1,1,1,1,
        0,0,1,1,1,1,0,0,0,0,0,
        1,1,1,1,1,1,1,1,0,0,0,0,0,0]

my goal is to find where there are five or more zeros in a row and then list the indexes of where this happens, for example the output for this would be:

[17,21][30,35]

here is what i have tried/seen in other questions asked on here:

def zero_runs(a):
    # Create an array that is 1 where a is 0, and pad each end with an extra 0.
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    # Runs start and end where absdiff is 1.
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges

    runs = zero_runs(list)

this gives output:

[0,10]
[11,12]
...

which is basically just listing indexes of all duplicates, how would i go about separating this data into what i need

Upvotes: 6

Answers (4)

user3483203

Reputation: 51165

Your current attempt is very close. It returns all of the runs of consecutive zeros in an array, so all you need to accomplish is adding a check to filter runs of less than 5 consecutive zeros out.

def threshold_zero_runs(a, threshold):
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)

    m = (np.diff(ranges, 1) >= threshold).ravel()
    return ranges[m]

array([[17, 22],
       [30, 36]], dtype=int64)

Upvotes: 2

pault

Reputation: 43494

Another way using itertools.groupby and enumerate.

First find the zeros and the indices:

from operator import itemgetter
from itertools import groupby

zerosList = [
    list(map(itemgetter(0), g)) 
    for i, g in groupby(enumerate(mylist), key=itemgetter(1)) 
    if not i
]
print(zerosList)
#[[11, 12], [17, 18, 19, 20, 21], [30, 31, 32, 33, 34, 35]]

Now just filter zerosList:

runs = [[x[0], x[-1]] for x in zerosList if len(x) >= 5]
print(runs)
#[[17, 21], [30, 35]]

Upvotes: 0

Prune

Reputation: 77837

Use the shift operator on the array. Compare the shifted version with the original. Where they do not match, you have a transition. You then need only to identify adjacent transitions that are at least 5 positions apart.

Can you take it from there?

Upvotes: 0

Dani Mesejo

Reputation: 61910

You could use itertools.groupby, it will identify the contiguous groups in the list:

from itertools import groupby

lst = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]

groups = [(k, sum(1 for _ in g)) for k, g in groupby(lst)]

cursor = 0
result = []
for k, l in groups:
    if not k and l >= 5:
        result.append([cursor, cursor + l - 1])
    cursor += l

print(result)

Output

[[17, 21], [30, 35]]

Upvotes: 5

Finding consecutive duplicates and listing their indexes of where they occur in python

Answers (4)

Related Questions