Grismar
Grismar

Reputation: 31339

Chunking iterables, including generators

I have this solution for chunking iterables:

def chunks(items, chunk_size):
    def get_chunk():
        try:
            for _ in range(chunk_size):
                yield next(iterator)
        except StopIteration:
            return False

    iterator = iter(items)
    while chunk := list(get_chunk()):
        yield chunk


for c in chunks([1, 2, 3, 4, 5, 6, 7, 8], 3):
    print(c)

It works well and unlike some other solutions I found on SO, it also deals with 'infinite' generators like:

def natural_numbers():
    n = 0
    while True:
        yield (n := n + 1)


tens = chunks(natural_numbers(), 10)
for _ in range(5):
    print(next(tens))

However, I can't shake the feeling that it should be possible to do it without the call to the internal function. Of course you could define an external function and pass in chunk_size and the iterator, which would avoid redefining get_chunk() on each call of chunks. But it would still have the overhead of calling that function for each chunk.

Does anyone have a suggestion that avoids the function call, but still works for an iterable that cannot be indexed or sliced?

The main reason I use the function is to be able to capture the StopIteration, which I don't think can be done in a generator comprehension without losing the last few items before the exception, but perhaps I'm wrong about that.

Upvotes: 3

Views: 1007

Answers (2)

wim
wim

Reputation: 362786

Using a while loop:

def chunks(items, chunk_size):
    iterator = iter(items)
    done = False
    while not done:
        chunk = []
        for _ in range(chunk_size):
            try:
                chunk.append(next(iterator))
            except StopIteration:
                done = True
                break
        if chunk:
            yield chunk

Using a for loop:

def chunks(items, chunk_size):
    iterator = iter(items)
    chunk = []
    for element in iterator:
        chunk.append(element)
        if len(chunk) == chunk_size:
            yield chunk
            chunk = []
    if chunk:
        yield chunk

Keeping your original idea but removing the nested function:

from itertools import islice

def chunks(items, chunk_size):
    iterator = iter(items)
    while chunk := list(islice(iterator, chunk_size)):
        yield chunk

Using a 3rd-party library:

>>> from more_itertools import chunked
>>> list(chunked([1, 2, 3, 4, 5, 6, 7, 8], 3))
[[1, 2, 3], [4, 5, 6], [7, 8]]

Upvotes: 2

Jamie Deith
Jamie Deith

Reputation: 714

If I'm not mistaken more-itertools has chunked and ichunked for this.

https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.chunked

Upvotes: 1

Related Questions