Reputation: 580
Given a very large (gigabytes) list of arbitrary objects (I've seen a similar solution to this for ints), can I either group it easily into sublists by equivalence? Either in-place or by generator which consumes the original list.
l0 = [A,B, A,B,B, A,B,B,B,B, A, A, A,B] #spaces for clarity
Desired result:
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]
I wrote a looping version like so:
#find boundaries
b0 = []
prev = A
group = A
for idx, elem in enumerate(l0):
if elem == group:
b0.append(idx)
prev = elem
b0.append(len(l0)-1)
for idx, b in enumerate(b0):
try:
c = b0[idx+1]
except:
break
if c == len(l0)-1:
l1.append(l0[b:])
else:
l1.append(l0[b:c])
Can this be done as a generator gen0(l)
that will work like:
for g in gen(l0):
print g
....
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
....
etc?
EDIT: using python 2.6 or 2.7
EDIT: preferred solution, mostly based on the accepted answer:
def gen_group(f, items):
out = [items[0]]
while items:
for elem in items[1:]:
if f(elem, out[0]):
break
else:
out.append(elem)
for _i in out:
items.pop(0)
yield out
if items:
out = [items[0]]
g = gen_group(lambda x, y: x == y, l0)
for out in g:
print out
Upvotes: 1
Views: 115
Reputation: 5534
Here's a simple generator to perform your task:
def gen_group(L):
DELIMETER = "A"
out = [DELIMETER]
while L:
for ind, elem in enumerate(L[1:]):
if elem == DELIMETER :
break
else:
out.append(elem)
for i in range(ind + 1):
L.pop(0)
yield out
out = [DELIMETER ]
The idea is to cut down the list and yield the sublists until there is nothing left. This assumes the list starts with "A" (DELIMETER
variable).
Sample output:
for out in gen_group(l0):
print out
Produces
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']
['A']
timeit.timeit(s, number=100000)
is used to test each of the current answers, where s
is the multiline string of the code (listed below):
Trial 1 Trial 2 Trial 3 Trial 4 | Avg
This answer (s1): 0.08247 0.07968 0.08635 0.07133 0.07995
Dilara Ismailova (s2): 0.77282 0.72337 0.73829 0.70574 0.73506
John Coleman (s3): 0.08119 0.09625 0.08405 0.08419 0.08642
This answer is the fastest, but it is very close. I suspect the difference is the additional argument and anonymous function in John Coleman's answer.
s1="""l0 = ["A","B", "A","B","B", "A","B","B","B","B", "A", "A", "A","B"]
def gen_group(L):
out = ["A"]
while L:
for ind, elem in enumerate(L[1:]):
if elem == "A":
break
else:
out.append(elem)
for i in range(ind + 1):
L.pop(0)
yield out
out = ["A"]
out =gen_group(l0)"""
s2 = """A, B = 'A', 'B'
x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))"""
s3 = """def subListGenerator(f,items):
i = 0
n = len(items)
while i < n:
sublist = [items[i]]
i += 1
while i < n and not f(items[i]):
sublist.append(items[i])
i += 1
yield sublist
items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
g = subListGenerator(lambda x: x == 'A',items)"""
Upvotes: 1
Reputation: 2447
The following works in this case. You could change the l[0] != 'A'
condition to be whatever. I would probably pass it as an argument, so that you can reuse it somewhere else.
def gen(l_arg, boundary):
l = l_arg.copy() # Optional if you want to save memory
while l:
sub_list = [l.pop(0)]
while l and l[0] != boundary: # Here boundary = 'A'
sub_list.append(l.pop(0))
yield sub_list
It assumes that there is an 'A'
at the beginning of your list. And it copies the list, which isn't great when the list is in the range of Gb. you could remove the copy to save memory if you don't care about keeping the original list.
Upvotes: 1
Reputation:
I assume that A is your breakpoint.
>>> A, B = 'A', 'B'
>>> x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
>>> map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]
Upvotes: 2
Reputation: 52008
Maybe something like this:
def subListGenerator(f,items):
i = 0
n = len(items)
while i < n:
sublist = [items[i]]
i += 1
while i < n and not f(items[i]):
sublist.append(items[i])
i += 1
yield sublist
Used like:
>>> items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
>>> g = subListGenerator(lambda x: x == 'A',items)
>>> for x in g: print(x)
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']
Upvotes: 2