user248237
user248237

Reputation:

chunking list by delimiter in Python

What is the current way to chunk a list of the following form: ["record_a:", "x"*N, "record_b:", "y"*M, ...], i.e. a list where the start of each record is denoted by a string ending in ":", and includes all the elements up until the next record. So the following list:

["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]

would be split into:

[["record_a", "a", "b"], ["record_b", "1", "2", "3", "4"]]

The list contains an arbitrary number of records, and each record contains an arbitrary number of list items (up until when the next records begins or when there are no more records.) how can this be done efficiently?

Upvotes: 0

Views: 224

Answers (4)

root
root

Reputation: 80386

from itertools import groupby,izip,chain

l = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]

[list(chain([x[0][0].strip(':')], x[1])) for x in izip(*[(list(g) 
            for _,g in groupby(l,lambda x: x.endswith(':')))]*2)]

out:

[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]

Upvotes: 1

DSM
DSM

Reputation: 353359

Okay, here's my end-of-work-day crazy itertools solution:

>>> from itertools import groupby, count
>>> d = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
>>> groups = (list(g) for _, g in groupby(d, lambda x: x.endswith(":")))
>>> git = iter(groups)
>>> paired = ((next(git), next(git)) for _ in count())
>>> combined = [ [a[0][:-1]] + b for a,b in paired]
>>> 
>>> combined
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]

(Done more as an example of the sorts of things one can do than as a piece of code I'd necessarily use.)

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1123410

Use a generator:

def chunkRecords(records):
    record = []
    for r in records:
        if r[-1] == ':':
            if record:
                yield record
            record = [r[:-1]]
        else:
            record.append(r)
    if record:
        yield record 

Then loop over that:

for record in chunkRecords(records):
    # record is a list

or turn in into a list again:

records = list(chunkRecords(records))

The latter results in:

>>> records = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
>>> records = list(chunkRecords(records))
>>> records
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]

Upvotes: 4

Emanuele Paolini
Emanuele Paolini

Reputation: 10172

lst = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
out = []
for x in lst:
    if x[-1] == ':':
        out.append([x])
    else:
        out[-1].append(x)

Upvotes: 4

Related Questions