Reputation: 2981
Given,
import itertools as it
def foo():
idx = 0
while True:
yield idx
idx += 1
k = foo()
When I use zip()
as in the following,
>>> list(zip(k,[11,12,13]))
[(0, 11), (1, 12), (2, 13)]
and then immediately after,
>>> list(zip(k,[11,12,13]))
[(4, 11), (5, 12), (6, 13)]
Notice that the second zip should have started with (3,11)
but it jumped to (4,11)
instead. It's as if there is another hidden next(k)
somewhere. This does not happen when I use it.islice
>>> k = foo()
>>> list(it.islice(k,6))
[0, 1, 2, 3, 4, 5]
Notice it.islice
is not missing the 3
term.
I am using Python 3.8.
Upvotes: 4
Views: 219
Reputation: 363063
For the special case where one of the input iterables is sized, you can do a little better than zip
:
import itertools as it
from collections.abc import Sized
def smarter_zip(*iterables):
sized = [i for i in iterables if isinstance(i, Sized)]
try:
min_length = min(len(s) for s in sized)
except ValueError:
# can't determine a min length.. fall back to regular zip
return zip(*iterables)
return zip(*[it.islice(i, min_length) for i in iterables])
It uses islice
to prevent zip
from consuming more from each iterator than we know is strictly necessary. This smarter_zip
will solve the problem for the case posed in the original question.
However, in the general case, there is no way to tell beforehand whether an iterator is exhausted or not (consider a generator yielding bytes arriving on a socket). If the shortest of the iterables is not sized, the original problem still remains. For solving the general case, you may want to wrap iterators in a class which remembers the last-yielded item, so that it can be recalled from memory if necessary.
Upvotes: 1
Reputation: 531718
zip
basically (and necessarily, given the design of the iterator protocol) works like this:
# zip is actually a class, but we'll pretend it's a generator
# function for simplicity.
def zip(xs, ys):
# zip doesn't require its arguments to be iterators, just iterable
xs = iter(xs)
ys = iter(ys)
while True:
x = next(xs)
y = next(ys)
yield x, y
There is no way to tell if ys
is exhausted before an element of xs
is consumed, and the iterator protocol doesn't provide a way for zip
to put x
"back" in xs
if next(ys)
raises a StopIteration
exception.
Upvotes: 4