Reputation: 31339
I have this solution for chunking iterables:
def chunks(items, chunk_size):
def get_chunk():
try:
for _ in range(chunk_size):
yield next(iterator)
except StopIteration:
return False
iterator = iter(items)
while chunk := list(get_chunk()):
yield chunk
for c in chunks([1, 2, 3, 4, 5, 6, 7, 8], 3):
print(c)
It works well and unlike some other solutions I found on SO, it also deals with 'infinite' generators like:
def natural_numbers():
n = 0
while True:
yield (n := n + 1)
tens = chunks(natural_numbers(), 10)
for _ in range(5):
print(next(tens))
However, I can't shake the feeling that it should be possible to do it without the call to the internal function. Of course you could define an external function and pass in chunk_size
and the iterator
, which would avoid redefining get_chunk()
on each call of chunks
. But it would still have the overhead of calling that function for each chunk.
Does anyone have a suggestion that avoids the function call, but still works for an iterable that cannot be indexed or sliced?
The main reason I use the function is to be able to capture the StopIteration
, which I don't think can be done in a generator comprehension without losing the last few items before the exception, but perhaps I'm wrong about that.
Upvotes: 3
Views: 1007
Reputation: 362786
Using a while loop:
def chunks(items, chunk_size):
iterator = iter(items)
done = False
while not done:
chunk = []
for _ in range(chunk_size):
try:
chunk.append(next(iterator))
except StopIteration:
done = True
break
if chunk:
yield chunk
Using a for loop:
def chunks(items, chunk_size):
iterator = iter(items)
chunk = []
for element in iterator:
chunk.append(element)
if len(chunk) == chunk_size:
yield chunk
chunk = []
if chunk:
yield chunk
Keeping your original idea but removing the nested function:
from itertools import islice
def chunks(items, chunk_size):
iterator = iter(items)
while chunk := list(islice(iterator, chunk_size)):
yield chunk
Using a 3rd-party library:
>>> from more_itertools import chunked
>>> list(chunked([1, 2, 3, 4, 5, 6, 7, 8], 3))
[[1, 2, 3], [4, 5, 6], [7, 8]]
Upvotes: 2
Reputation: 714
If I'm not mistaken more-itertools
has chunked
and ichunked
for this.
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.chunked
Upvotes: 1