ovgolovin
ovgolovin

Reputation: 13410

izip_longest in itertools: How does rasing IndexError inside the iterator work?

In this question @lazyr asks how the following code of izip_longest iterator from here works:

def izip_longest_from_docs(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
        yield counter()         # yields the fillvalue, or raises IndexError
    fillers = repeat(fillvalue)
    iters = [chain(it, sentinel(), fillers) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except IndexError:
        pass

When I was trying to understand how it works I stumbled into the question: "What if IndexError is raised inside one of those iterators that are sent to izip_longest as parameters?".

Then I wrote some testing code:

from itertools import izip_longest, repeat, chain, izip

def izip_longest_from_docs(*args, **kwds):
    # The code is exactly the same as shown above
    ....

def gen1():
    for i in range(5):
        yield i

def gen2():
    for i in range(10):
        if i==8:
            raise IndexError #simulation IndexError raised inside the iterator
        yield i

for i in izip_longest_from_docs(gen1(),gen2(), fillvalue = '-'):
    print('{i[0]} {i[1]}'.format(**locals()))

print('\n')

for i in izip_longest(gen1(),gen2(), fillvalue = '-'):
    print('{i[0]} {i[1]}'.format(**locals()))

And it turned out that the function in itertools module and izip_longest_from_docs work differently.

The output of the code above:

>>> 
0 0
1 1
2 2
3 3
4 4
- 5
- 6
- 7


0 0
1 1
2 2
3 3
4 4
- 5
- 6
- 7

Traceback (most recent call last):
  File "C:/..., line 31, in <module>
    for i in izip_longest(gen1(),gen2(), fillvalue = '-'):
  File "C:/... test_IndexError_inside iterator.py", line 23, in gen2
    raise IndexError
IndexError

So, it's clearly seen, that the code of izip_longes from itertools did propagate IndexError exception (as I think it should), but izip_longes_from_docs 'swallowed' IndexError exception as it took it as a signal from sentinel to stop iterating.

My question is, how did they worked around IndexError propagation in the code in theitertools module?

Upvotes: 4

Views: 780

Answers (1)

agf
agf

Reputation: 176910

in izip_longest_next in the code of izip_longest, no sentinel is used.

Instead, CPython keeps track of how many of the iterators are still active with a counter, and stops when the number active reaches zero.

If an error occurs, it ends iteration as if there are no iterators still active, and allows the error to propagate.

The code:

            item = PyIter_Next(it);
            if (item == NULL) {
                lz->numactive -= 1;
                if (lz->numactive == 0 || PyErr_Occurred()) {
                    lz->numactive = 0;
                    Py_DECREF(result);
                    return NULL;
                } else {
                    Py_INCREF(lz->fillvalue);
                    item = lz->fillvalue;
                    PyTuple_SET_ITEM(lz->ittuple, i, NULL);
                    Py_DECREF(it);
                }
            }

The simplest solution I see:

def izip_longest_modified(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    class LongestExhausted(Exception):
        pass
    def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
        try:
            yield counter()         # yields the fillvalue, or raises IndexError
        except:
            raise LongestExhausted
    fillers = repeat(fillvalue)
    iters = [chain(it, sentinel(), fillers) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except LongestExhausted:
        pass

Upvotes: 3

Related Questions