Reputation:
Assume I have a list of this type:
# 0 1 2 3 4 5 6 7 8 9 10 11 -- list index
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]
I want to find each index for which the value is the same for the n
following indices.
I can do it (laboriously) this way:
def sub_seq(li,n):
ans={}
for x in set(li):
ans[x]=[i for i,e in enumerate(li[:-n+1]) if all(x==y for y in li[i:i+n])]
ans={k:v for k,v in ans.items() if v}
return ans
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1]
for i in (5,4,3,2):
print i, sub_seq(li,i)
Prints:
5 {1: [5]}
4 {1: [5, 6]}
3 {1: [5, 6, 7]}
2 {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]}
Is there a better way to do this?
Upvotes: 9
Views: 619
Reputation: 226199
Analyzing data is typically easier if you first convert it to a convenient form. In this case, a run-length-encoding would be a good starting point:
from itertools import groupby, accumulate
from collections import defaultdict
def sub_seq(li, n):
d = defaultdict(list)
rle = [(k, len(list(g))) for k, g in groupby(li)]
endpoints = accumulate(size for k, size in rle)
for end_index, (value, count) in zip(endpoints, rle):
for index in range(end_index - count, end_index - n + 1):
d[value].append(index)
return dict(d)
Upvotes: 5
Reputation: 20679
As Raymond Hettinger points out in his answer, groupby
makes easier to check consecutive values. If you also enumerate the list, you can keep the corresponding indices and add them to the dictionary (I use defaultdict
to make the function as short as possible):
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
li = [-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1]
def sub_seq(li, n):
res = defaultdict(list)
for k, g in groupby(enumerate(li), itemgetter(1)):
l = list(map(itemgetter(0), g))
if n <= len(l): res[k] += l[0:len(l)-n+1]
return res
for i in (5,4,3,2):
print i, sub_seq(li,i)
Which prints:
5 defaultdict(<type 'list'>, {1: [5]})
4 defaultdict(<type 'list'>, {1: [5, 6]})
3 defaultdict(<type 'list'>, {1: [5, 6, 7]})
2 defaultdict(<type 'list'>, {1: [5, 6, 7, 8], 2: [2], -1: [0, 10]})
Upvotes: 1
Reputation: 2216
I personally think that this is a bit more readable, constructs less objects and I would guess runs faster.
li=[-1, -1, 2, 2, -1, 1, 1, 1, 1, 1, -1, -1 ]
results = []
i = 0
while i < len(li):
j = i + 1
while j < len(li) and li[i] == li[j]:
j += 1
results.append((i,li[i],j-i))
i = j
print results #[(0, -1, 2), (2, 2, 2), (4, -1, 1), (5, 1, 5), (10, -1, 2)]
Upvotes: 0