Reputation:
What is the current way to chunk a list of the following form: ["record_a:", "x"*N, "record_b:", "y"*M, ...]
, i.e. a list where the start of each record is denoted by a string ending in ":", and includes all the elements up until the next record. So the following list:
["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
would be split into:
[["record_a", "a", "b"], ["record_b", "1", "2", "3", "4"]]
The list contains an arbitrary number of records, and each record contains an arbitrary number of list items (up until when the next records begins or when there are no more records.) how can this be done efficiently?
Upvotes: 0
Views: 224
Reputation: 80386
from itertools import groupby,izip,chain
l = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
[list(chain([x[0][0].strip(':')], x[1])) for x in izip(*[(list(g)
for _,g in groupby(l,lambda x: x.endswith(':')))]*2)]
out:
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]
Upvotes: 1
Reputation: 353359
Okay, here's my end-of-work-day crazy itertools solution:
>>> from itertools import groupby, count
>>> d = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
>>> groups = (list(g) for _, g in groupby(d, lambda x: x.endswith(":")))
>>> git = iter(groups)
>>> paired = ((next(git), next(git)) for _ in count())
>>> combined = [ [a[0][:-1]] + b for a,b in paired]
>>>
>>> combined
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]
(Done more as an example of the sorts of things one can do than as a piece of code I'd necessarily use.)
Upvotes: 1
Reputation: 1123410
Use a generator:
def chunkRecords(records):
record = []
for r in records:
if r[-1] == ':':
if record:
yield record
record = [r[:-1]]
else:
record.append(r)
if record:
yield record
Then loop over that:
for record in chunkRecords(records):
# record is a list
or turn in into a list again:
records = list(chunkRecords(records))
The latter results in:
>>> records = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
>>> records = list(chunkRecords(records))
>>> records
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]
Upvotes: 4
Reputation: 10172
lst = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
out = []
for x in lst:
if x[-1] == ':':
out.append([x])
else:
out[-1].append(x)
Upvotes: 4