Reputation: 617
Suppose I have a list like this:
lst = [1, 3, 4, 5, 8, 10, 14, 20, 21, 22, 23, 40, 47, 48]
I need to extract the elements which have a difference of one. I need final output to look like this:
[[3, 4, 5], [20, 21, 22, 23], [47, 48]]
Here is what I have tried so far for this particular problem which has yielded some progress but doesn't achieve 100% of what I need:
final_list = []
for i in range(len(lst)):
sub_list = []
for ii in range(i, len(lst)):
prev_num = lst[ii-1]
if lst[ii] - prev_num == 1:
# print(lst[ii], end=",")
sub_array.append(lst[ii])
else:
break
if sub_list:
final_list.append(sub_list)
print(final_list)
Output:
[[4, 5], [5], [21, 22, 23], [22, 23], [23], [48]]
Upvotes: 3
Views: 836
Reputation: 106598
You can use itertools.groupby
to group the items with a key function that subtracts each item value with an incremental counter, which would result in a fixed value for each group of consecutive integers:
from itertools import groupby, count
lst = [1, 3, 4, 5, 8, 10, 14, 20, 21, 22, 23, 40, 47, 48]
c = count()
final_list = [g for _, [*g] in groupby(lst, lambda t: t - next(c)) if len(g) > 1]
final_list
would become:
[[3, 4, 5], [20, 21, 22, 23], [47, 48]]
EDIT: If you'd rather have better performance than more concise code, you can look at @MadPhysicist's answer, which avoids overhead incurred from calling generator functions, and @don'ttalkjustcode's answer, which avoids creating lists until a pair of consecutive numbers are found. Combine the two answers and you get a solution that avoids both types of overhead:
out = []
prev = float('inf')
for i in lst:
if i - prev != 1:
current = None
elif current:
current.append(i)
else:
current = [prev, i]
out.append(current)
prev = i
Sample timing using @don'ttalkjustcode's benchmark code:
2716 μs Mad_Physicist
1554 μs dont_talk_just_code
1284 μs Mad_Physicist_x_dont_talk_just_code
Upvotes: 6
Reputation: 114320
You can step along the input list and start adding elements to an output list. Any time the difference is greater than 1, start a new output list. If the prior output is length 1, remove it:
out = []
current = []
prev = None
for i in lst:
if prev is None or i - prev != 1:
if len(current) > 1:
out.append(current)
current = []
current.append(i)
prev = i
if len(current) > 1:
out.append(current)
Upvotes: 2
Reputation: 42143
You can go through the elements and either add them as a new group or add them to the last group depending on whether it is one more than the last one added. When done, remove the single element groups.
lst = [1, 3, 4, 5, 8, 10, 14, 20, 21, 22, 23, 40, 47, 48]
groups = []
for n in lst:
if groups and n-1 == groups[-1][-1]:
groups[-1].append(n) # +1 element, add to last group
else:
groups.append([n]) # add as a new group
groups = [g for g in groups if len(g)>1] # remove single element groups
print(groups)
[[3, 4, 5], [20, 21, 22, 23], [47, 48]]
Upvotes: 1
Reputation: 10230
A zip
solution that adds each first pair of a streak as an inner list and then appends further elements of the streak:
out = []
inner = None
for x, y in zip(lst, lst[1:]):
if y - x == 1:
if inner:
inner.append(y)
else:
inner = [x, y]
out.append(inner)
else:
inner = None
Benchmark with a random list 1000 as long as your example and with same density:
5172 μs blhsing
2697 μs Mad_Physicist
7667 μs Osman_Mamun
1571 μs dont_talk_just_code
Benchmark code (Try it online!):
from timeit import timeit
from itertools import groupby, count
from random import sample
def blhsing(lst):
c = count()
return [g for _, [*g] in groupby(lst, lambda t: t - next(c)) if len(g) > 1]
def Mad_Physicist(lst):
out = []
current = []
prev = None
for i in lst:
if prev is None or i - prev > 1:
if len(current) > 1:
out.append(current)
current = []
current.append(i)
prev = i
if len(current) > 1:
out.append(current)
return out
def Osman_Mamun(lst):
out = []
for k, g in groupby(enumerate(lst), key=lambda i: i[1]-i[0]):
temp = list(g)
if len(temp) > 1:
out.append([i[1] for i in temp])
return out
def dont_talk_just_code(lst):
out = []
inner = None
for x, y in zip(lst, lst[1:]):
if y - x == 1:
if inner:
inner.append(y)
else:
inner = [x, y]
out.append(inner)
else:
inner = None
return out
lst_example = [1, 3, 4, 5, 8, 10, 14, 20, 21, 22, 23, 40, 47, 48]
funcs = blhsing, Mad_Physicist, Osman_Mamun, dont_talk_just_code
for _ in range(3):
lst = sorted(sample(range(max(lst_example) * 1000), len(lst_example) * 1000))
# lst = lst_example
results = []
for func in funcs:
t = timeit(lambda: results.append(func(lst)), number=100) / 100
print('%d μs ' % (t * 1e6), func.__name__)
assert all(result == results[0] for result in results), (list(map(len, results)), results)
print()
Upvotes: 2
Reputation: 2882
Using itertools.groupby
:
In [19]: out = []
In [20]: for k, g in groupby(enumerate(lst), key=lambda i: i[1]-i[0]):
...: temp = list(g)
...: if len(temp) > 1:
...: out.append([i[1] for i in temp])
...:
In [21]: out
Out[21]: [[3, 4, 5], [20, 21, 22, 23], [47, 48]]
Upvotes: 1