Reputation:
From what I understand, a for x in a_generator: foo(x)
loop in Python is roughly equivalent to this:
try:
while True:
foo(next(a_generator))
except StopIteration:
pass
That suggests that something like this:
for outer_item in a_generator:
if should_inner_loop(outer_item):
for inner_item in a_generator:
foo(inner_item)
if stop_inner_loop(inner_item): break
else:
bar(outer_item)
would do two things:
y
until it reaches some x
where should_inner_loop(x)
returns truthy, then loop over it in the inner for
until stop_inner_loop(thing)
returns true. Then, the outer loop resumes where the inner one left off.From my admittedly not very good tests, it seems to perform as above. However, I couldn't find anything in the spec guaranteeing that this behavior is constant across interpreters. Is there anywhere that says or implies that I can be sure it will always be like this? Can it cause errors, or perform in some other way? (i.e. do something other than what's described above
N.B. The code equivalent above is taken from my own experience; I don't know if it's actually accurate. That's why I'm asking.
Upvotes: 8
Views: 2512
Reputation: 9753
TL;DR: it is safe with CPython (but I could not find any specification of this), although it may not do what you want to do.
First, let's talk about your first assumption, the equivalence.
A for loop actually calls first iter()
on the object, then runs next()
on its result, until it gets a StopIteration
.
Here is the relevant bytecode (a low level form of Python, used by the interpreter itself):
>>> import dis
>>> def f():
... for x in y:
... print(x)
...
>>> dis.dis(f)
2 0 SETUP_LOOP 24 (to 27)
3 LOAD_GLOBAL 0 (y)
6 GET_ITER
>> 7 FOR_ITER 16 (to 26)
10 STORE_FAST 0 (x)
3 13 LOAD_GLOBAL 1 (print)
16 LOAD_FAST 0 (x)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 JUMP_ABSOLUTE 7
>> 26 POP_BLOCK
>> 27 LOAD_CONST 0 (None)
30 RETURN_VALUE
GET_ITER
calls iter(y)
(which itself calls y.__iter__()
) and pushes its result on the stack (think of it as a bunch of local unnamed variables), then enters the loop at FOR_ITER
, which calls next(<iterator>)
(which itself calls <iterator>.__next__()
), then executes the code inside the loop, and the JUMP_ABSOLUTE
makes the execution comes back to FOR_ITER
.
Now, for the safety:
Here are the methods of a generator: https://hg.python.org/cpython/file/101404/Objects/genobject.c#l589
As you can see at line 617, the implementation of __iter__()
is PyObject_SelfIter
, whose implementation you can find here. PyObject_SelfIter
simply returns the object (ie. the generator) itself.
So, when you nest the two loops, both iterate on the same iterator.
And, as you said, they are just calling next()
on it, so it's safe.
But be cautious: the inner loop will consume items that will not be consumed by the outer loop. Even if that is what you want to do, it may not be very readable.
If that is not what you want to do, consider itertools.tee()
, which buffers the output of an iterator, allowing you to iterate over its output twice (or more). This is only efficient if the tee iterators stay close to each other in the output stream; if one tee iterator will be fully exhausted before the other is used, it's better to just call list
on the iterator to materialize a list out of it.
Upvotes: 7
Reputation: 36023
It's not really an answer to your question, but I would recommend not doing this because the code isn't readable. It took me a while to see that you were using y
twice even though that's the entire point of your question. Don't make a future reader get confused by this. When I see a nested loop, I'm not expecting what you've done and my brain has trouble seeing it.
I would do it like this:
def generator_with_state(y):
state = 0
for x in y:
if isinstance(x, special_thing):
state = 1
continue
elif state == 1 and isinstance(x, signal):
state = 0
yield x, state
for x, state in generator_with_state(y):
if state == 1:
foo(x)
else:
bar(x)
Upvotes: 2
Reputation: 81594
No, it's not safe (as in, we won't get the outcome that we might have expected).
Consider this:
a = (_ for _ in range(20))
for num in a:
print(num)
Of course, we will get 0 to 19 printed.
Now let's add a bit of code:
a = (_ for _ in range(20))
for num in a:
for another_num in a:
pass
print(num)
The only thing that will be printed is 0
.
By the time that we get to the second iteration of the outer loop, the generator will already be exhausted by the inner loop.
We can also do this:
a = (_ for _ in range(20))
for num in a:
for another_num in a:
print(another_num)
If it was safe we would expect to get 0 to 19 printed 20 times, but we actually get it printed only once, for the same reason I mentioned above.
Upvotes: 3