jambox
jambox

Reputation: 580

How can I group a list of objects by continuity?

Given a very large (gigabytes) list of arbitrary objects (I've seen a similar solution to this for ints), can I either group it easily into sublists by equivalence? Either in-place or by generator which consumes the original list.

l0 = [A,B, A,B,B, A,B,B,B,B, A, A, A,B] #spaces for clarity

Desired result:

[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]

I wrote a looping version like so:

#find boundaries
b0 = []
prev = A
group = A
for idx, elem in enumerate(l0):
    if elem == group:
        b0.append(idx)
    prev = elem
b0.append(len(l0)-1)

for idx, b in enumerate(b0):
    try:
        c = b0[idx+1]
    except:
        break
    if c == len(l0)-1:
        l1.append(l0[b:])
    else:
        l1.append(l0[b:c])

Can this be done as a generator gen0(l) that will work like:

for g in gen(l0):
    print g
....
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
.... 

etc?

EDIT: using python 2.6 or 2.7

EDIT: preferred solution, mostly based on the accepted answer:

def gen_group(f, items):
    out = [items[0]]
    while items:
        for elem in items[1:]:
            if f(elem, out[0]):
                break
            else:
                out.append(elem)

        for _i in out:
            items.pop(0)
        yield out
        if items:
            out = [items[0]]

g = gen_group(lambda x, y: x == y, l0)

for out in g:
    print out

Upvotes: 1

Views: 115

Answers (4)

wnnmaw
wnnmaw

Reputation: 5534

Here's a simple generator to perform your task:

def gen_group(L):
    DELIMETER = "A"
    out = [DELIMETER]
    while L:
        for ind, elem in enumerate(L[1:]):
            if elem == DELIMETER :
                break
            else:
                out.append(elem)

        for i in range(ind + 1):
            L.pop(0)

        yield out
        out = [DELIMETER ]

The idea is to cut down the list and yield the sublists until there is nothing left. This assumes the list starts with "A" (DELIMETER variable).

Sample output:

for out in gen_group(l0):
    print out

Produces

['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']
['A']

Comparitive Timings:

timeit.timeit(s, number=100000) is used to test each of the current answers, where s is the multiline string of the code (listed below):

                       Trial 1  Trial 2  Trial 3  Trial 4 |  Avg
This answer (s1):      0.08247  0.07968  0.08635  0.07133   0.07995
Dilara Ismailova (s2): 0.77282  0.72337  0.73829  0.70574   0.73506
John Coleman (s3):     0.08119  0.09625  0.08405  0.08419   0.08642

This answer is the fastest, but it is very close. I suspect the difference is the additional argument and anonymous function in John Coleman's answer.

s1="""l0 = ["A","B", "A","B","B", "A","B","B","B","B", "A", "A", "A","B"]

def gen_group(L):
    out = ["A"]
    while L:
        for ind, elem in enumerate(L[1:]):
            if elem == "A":
                break
            else:
                out.append(elem)

        for i in range(ind + 1):
            L.pop(0)

        yield out
        out = ["A"]

out =gen_group(l0)"""

s2 = """A, B = 'A', 'B'
x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))"""

s3 = """def subListGenerator(f,items):
    i = 0
    n = len(items)
    while i < n:
        sublist = [items[i]]
        i += 1
        while i < n and not f(items[i]):
            sublist.append(items[i])
            i += 1
        yield sublist

items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
g = subListGenerator(lambda x: x == 'A',items)"""

Upvotes: 1

ursan
ursan

Reputation: 2447

The following works in this case. You could change the l[0] != 'A' condition to be whatever. I would probably pass it as an argument, so that you can reuse it somewhere else.

def gen(l_arg, boundary):
    l = l_arg.copy()    # Optional if you want to save memory
    while l:
        sub_list = [l.pop(0)]
        while l and l[0] != boundary:   # Here boundary = 'A'
            sub_list.append(l.pop(0))
        yield sub_list

It assumes that there is an 'A' at the beginning of your list. And it copies the list, which isn't great when the list is in the range of Gb. you could remove the copy to save memory if you don't care about keeping the original list.

Upvotes: 1

user6183958
user6183958

Reputation:

I assume that A is your breakpoint.

>>> A, B = 'A', 'B'
>>> x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
>>> map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]

Upvotes: 2

John Coleman
John Coleman

Reputation: 52008

Maybe something like this:

def subListGenerator(f,items):
    i = 0
    n = len(items)
    while i < n:
        sublist = [items[i]]
        i += 1
        while i < n and not f(items[i]):
            sublist.append(items[i])
            i += 1
        yield sublist

Used like:

>>> items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
>>> g = subListGenerator(lambda x: x == 'A',items)
>>> for x in g: print(x)

['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']

Upvotes: 2

Related Questions