Reputation: 3509
I have the following experimental code whose function is similar to the zip
built-in. What it tries to do should have been simple and clear, trying to return the zipped tuples one at a time until an IndexError
occurs when we stop the generator.
def my_zip(*args):
i = 0
while True:
try:
yield (arg[i] for arg in args)
except IndexError:
raise StopIteration
i += 1
However, when I tried to execute the following code, the IndexError
was not caught but instead thrown by the generator:
gen = my_zip([1,2], ['a','b'])
print(list(next(gen)))
print(list(next(gen)))
print(list(next(gen)))
IndexError Traceback (most recent call last)
I:\Software\WinPython-32bit-3.4.2.4\python-3.4.2\my\temp2.py in <module>()
12 print(list(next(gen)))
13 print(list(next(gen)))
---> 14 print(list(next(gen)))
I:\Software\WinPython-32bit-3.4.2.4\python-3.4.2\my\temp2.py in <genexpr>(.0)
3 while True:
4 try:
----> 5 yield (arg[i] for arg in args)
6 except IndexError:
7 raise StopIteration
IndexError: list index out of range
Why is this happening?
Thanks @thefourtheye for providing a nice explanation for what's happening above. Now another problem occurs when I execute:
list(my_zip([1,2], ['a','b']))
This line never returns and seems to hang the machine. What's happening now?
Upvotes: 23
Views: 3285
Reputation: 59594
def my_zip(*args):
i = 0
while True:
try:
yield (arg[i] for arg in args)
except IndexError:
raise StopIteration
i += 1
IndexError
is not caught, because (arg[i] for arg in args)
is a generator which is not executed immediately, but when you start iterating over it. And you iterate over it in another scope, when you call list((arg[i] for arg in args))
:
# get the generator which yields another generator on each iteration
gen = my_zip([1,2], ['a','b'])
# get the second generator `(arg[i] for arg in args)` from the first one
# then iterate over it: list((arg[i] for arg in args))
print(list(next(gen)))
list(next(gen))
i
equals 0.list(next(gen))
i
equals 1.list(next(gen))
i
equals 2. And here you get IndexError
-- in the outer scope. The line is treated as list(arg[2] for arg in ([1,2], ['a','b']))
Upvotes: 2
Reputation: 239453
The yield
yields a generator object everytime and when the generators were created there was no problem at all. That is why try...except
in my_zip
is not catching anything. The third time when you executed it,
list(arg[2] for arg in args)
this is how it got reduced to (over simplified for our understanding) and now, observe carefully, list
is iterating the generator, not the actual my_zip
generator. Now, list
calls next
on the generator object and arg[2]
is evaluated, only to find that 2
is not a valid index for arg
(which is [1, 2]
in this case), so IndexError
is raised, and list
fails to handle it (it has no reason to handle that anyway) and so it fails.
As per the edit,
list(my_zip([1,2], ['a','b']))
will be evaluated like this. First, my_zip
will be called and that will give you a generator object. Then iterate it with list
. It calls next
on it, and it gets another generator object list(arg[0] for arg in args)
. Since there is no exception or return
encountered, it will call next
, to get another generator object list(arg[1] for arg in args)
and it keeps on iterating. Remember, the yielded generators are never iterated, so we ll never get the IndexError
. That is why the code runs infinitely.
You can confirm this like this,
from itertools import islice
from pprint import pprint
pprint(list(islice(my_zip([1, 2], ["a", 'b']), 10)))
and you will get
[<generator object <genexpr> at 0x7f4d0a709678>,
<generator object <genexpr> at 0x7f4d0a7096c0>,
<generator object <genexpr> at 0x7f4d0a7099d8>,
<generator object <genexpr> at 0x7f4d0a709990>,
<generator object <genexpr> at 0x7f4d0a7095a0>,
<generator object <genexpr> at 0x7f4d0a709510>,
<generator object <genexpr> at 0x7f4d0a7095e8>,
<generator object <genexpr> at 0x7f4d0a71c708>,
<generator object <genexpr> at 0x7f4d0a71c750>,
<generator object <genexpr> at 0x7f4d0a71c798>]
So the code tries to build an infinite list of generator objects.
Upvotes: 13
Reputation: 87064
Sorry, I'm not able to offer a coherent explanation regarding the failure to catch the exception, however, there's an easy way around it; use a for loop over the length of the shortest sequence:
def my_zip(*args):
for i in range(min(len(arg) for arg in args)):
yield (arg[i] for arg in args)
>>> gen = my_zip([1,2], ["a",'b','c'])
>>> print(list(next(gen)))
[1, 'a']
>>> print(list(next(gen)))
[2, 'b']
>>> print(list(next(gen)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Upvotes: 1
Reputation: 44838
Try replacing yield (arg[i] for ...)
with the following.
for arg in args:
yield arg[i]
But in case of numbers that causes an exception as 1[1]
makes no sense. I suggest replacing arg[i]
just with arg
.
Upvotes: 0