reckoner
reckoner

Reputation: 2981

Extra next() for generator in `zip()`?

Given,

import itertools as it
def foo():
    idx = 0
    while True:
        yield idx
        idx += 1

k = foo()

When I use zip() as in the following,

>>> list(zip(k,[11,12,13]))
[(0, 11), (1, 12), (2, 13)]

and then immediately after,

>>> list(zip(k,[11,12,13]))
[(4, 11), (5, 12), (6, 13)]

Notice that the second zip should have started with (3,11) but it jumped to (4,11) instead. It's as if there is another hidden next(k) somewhere. This does not happen when I use it.islice

>>> k = foo()
>>> list(it.islice(k,6))
[0, 1, 2, 3, 4, 5]

Notice it.islice is not missing the 3 term.

I am using Python 3.8.

Upvotes: 4

Views: 219

Answers (2)

wim
wim

Reputation: 363063

For the special case where one of the input iterables is sized, you can do a little better than zip:

import itertools as it
from collections.abc import Sized

def smarter_zip(*iterables):
    sized = [i for i in iterables if isinstance(i, Sized)]
    try:
        min_length = min(len(s) for s in sized)
    except ValueError:
        # can't determine a min length.. fall back to regular zip
        return zip(*iterables)
    return zip(*[it.islice(i, min_length) for i in iterables])

It uses islice to prevent zip from consuming more from each iterator than we know is strictly necessary. This smarter_zip will solve the problem for the case posed in the original question.

However, in the general case, there is no way to tell beforehand whether an iterator is exhausted or not (consider a generator yielding bytes arriving on a socket). If the shortest of the iterables is not sized, the original problem still remains. For solving the general case, you may want to wrap iterators in a class which remembers the last-yielded item, so that it can be recalled from memory if necessary.

Upvotes: 1

chepner
chepner

Reputation: 531718

zip basically (and necessarily, given the design of the iterator protocol) works like this:

 # zip is actually a class, but we'll pretend it's a generator
 # function for simplicity.
 def zip(xs, ys):
     # zip doesn't require its arguments to be iterators, just iterable
     xs = iter(xs)
     ys = iter(ys)
     while True:
         x = next(xs)
         y = next(ys)
         yield x, y

There is no way to tell if ys is exhausted before an element of xs is consumed, and the iterator protocol doesn't provide a way for zip to put x "back" in xs if next(ys) raises a StopIteration exception.

Upvotes: 4

Related Questions