user688635
user688635

Reputation:

Sequence of elements in a list satisfying a condition

Assume I have a list of this type:

#    0   1  2  3   4  5  6  7  8  9   10  11 -- list index
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]   

I want to find each index for which the value is the same for the n following indices.

I can do it (laboriously) this way:

def sub_seq(li,n):
    ans={}
    for x in set(li):
        ans[x]=[i for i,e in enumerate(li[:-n+1]) if all(x==y for y in li[i:i+n])]

    ans={k:v for k,v in ans.items() if v}

    return ans

li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1] 
for i in (5,4,3,2):
    print i, sub_seq(li,i)    

Prints:

5 {1: [5]}
4 {1: [5, 6]}
3 {1: [5, 6, 7]}
2 {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]}

Is there a better way to do this?

Upvotes: 9

Views: 619

Answers (3)

Raymond Hettinger
Raymond Hettinger

Reputation: 226199

Analyzing data is typically easier if you first convert it to a convenient form. In this case, a run-length-encoding would be a good starting point:

from itertools import groupby, accumulate
from collections import defaultdict

def sub_seq(li, n):
    d = defaultdict(list)
    rle = [(k, len(list(g))) for k, g in groupby(li)]
    endpoints = accumulate(size for k, size in rle)
    for end_index, (value, count) in zip(endpoints, rle):
        for index in range(end_index - count, end_index - n + 1):
            d[value].append(index)
    return dict(d)

Upvotes: 5

A. Rodas
A. Rodas

Reputation: 20679

As Raymond Hettinger points out in his answer, groupby makes easier to check consecutive values. If you also enumerate the list, you can keep the corresponding indices and add them to the dictionary (I use defaultdict to make the function as short as possible):

from itertools import groupby
from operator import itemgetter
from collections import defaultdict

li = [-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1]

def sub_seq(li, n):
    res = defaultdict(list)
    for k, g in groupby(enumerate(li), itemgetter(1)):
        l = list(map(itemgetter(0), g))
        if n <= len(l): res[k] += l[0:len(l)-n+1]
    return res

for i in (5,4,3,2):
    print i, sub_seq(li,i)

Which prints:

5 defaultdict(<type 'list'>, {1: [5]})
4 defaultdict(<type 'list'>, {1: [5, 6]})
3 defaultdict(<type 'list'>, {1: [5, 6, 7]})
2 defaultdict(<type 'list'>, {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]})

Upvotes: 1

placeybordeaux
placeybordeaux

Reputation: 2216

I personally think that this is a bit more readable, constructs less objects and I would guess runs faster.

li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]

results = []
i = 0
while i < len(li):
    j = i + 1
    while j < len(li) and li[i] == li[j]:
        j += 1
    results.append((i,li[i],j-i))
    i = j

print results #[(0, -1, 2), (2, 2, 2), (4, -1, 1), (5, 1, 5), (10, -1, 2)]

Upvotes: 0

Related Questions