anon
anon

Reputation:

Is looping through a generator in a loop over that same generator safe in Python?

From what I understand, a for x in a_generator: foo(x) loop in Python is roughly equivalent to this:

try:
    while True:
        foo(next(a_generator))
except StopIteration:
    pass

That suggests that something like this:

for outer_item in a_generator:
    if should_inner_loop(outer_item):
        for inner_item in a_generator:
            foo(inner_item)
            if stop_inner_loop(inner_item): break
    else:
        bar(outer_item)

would do two things:

  1. Not raise any exceptions, segfault, or anything like that
  2. Iterate over y until it reaches some x where should_inner_loop(x) returns truthy, then loop over it in the inner for until stop_inner_loop(thing) returns true. Then, the outer loop resumes where the inner one left off.

From my admittedly not very good tests, it seems to perform as above. However, I couldn't find anything in the spec guaranteeing that this behavior is constant across interpreters. Is there anywhere that says or implies that I can be sure it will always be like this? Can it cause errors, or perform in some other way? (i.e. do something other than what's described above


N.B. The code equivalent above is taken from my own experience; I don't know if it's actually accurate. That's why I'm asking.

Upvotes: 8

Views: 2512

Answers (3)

Valentin Lorentz
Valentin Lorentz

Reputation: 9753

TL;DR: it is safe with CPython (but I could not find any specification of this), although it may not do what you want to do.


First, let's talk about your first assumption, the equivalence.

A for loop actually calls first iter() on the object, then runs next() on its result, until it gets a StopIteration.

Here is the relevant bytecode (a low level form of Python, used by the interpreter itself):

>>> import dis
>>> def f():
...  for x in y:
...   print(x)
... 
>>> dis.dis(f)
  2           0 SETUP_LOOP              24 (to 27)
              3 LOAD_GLOBAL              0 (y)
              6 GET_ITER
        >>    7 FOR_ITER                16 (to 26)
             10 STORE_FAST               0 (x)

  3          13 LOAD_GLOBAL              1 (print)
             16 LOAD_FAST                0 (x)
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 POP_TOP
             23 JUMP_ABSOLUTE            7
        >>   26 POP_BLOCK
        >>   27 LOAD_CONST               0 (None)
             30 RETURN_VALUE

GET_ITER calls iter(y) (which itself calls y.__iter__()) and pushes its result on the stack (think of it as a bunch of local unnamed variables), then enters the loop at FOR_ITER, which calls next(<iterator>) (which itself calls <iterator>.__next__()), then executes the code inside the loop, and the JUMP_ABSOLUTE makes the execution comes back to FOR_ITER.


Now, for the safety:

Here are the methods of a generator: https://hg.python.org/cpython/file/101404/Objects/genobject.c#l589 As you can see at line 617, the implementation of __iter__() is PyObject_SelfIter, whose implementation you can find here. PyObject_SelfIter simply returns the object (ie. the generator) itself.

So, when you nest the two loops, both iterate on the same iterator. And, as you said, they are just calling next() on it, so it's safe.

But be cautious: the inner loop will consume items that will not be consumed by the outer loop. Even if that is what you want to do, it may not be very readable.

If that is not what you want to do, consider itertools.tee(), which buffers the output of an iterator, allowing you to iterate over its output twice (or more). This is only efficient if the tee iterators stay close to each other in the output stream; if one tee iterator will be fully exhausted before the other is used, it's better to just call list on the iterator to materialize a list out of it.

Upvotes: 7

Alex Hall
Alex Hall

Reputation: 36023

It's not really an answer to your question, but I would recommend not doing this because the code isn't readable. It took me a while to see that you were using y twice even though that's the entire point of your question. Don't make a future reader get confused by this. When I see a nested loop, I'm not expecting what you've done and my brain has trouble seeing it.

I would do it like this:

def generator_with_state(y):
    state = 0
    for x in y:
        if isinstance(x, special_thing):
            state = 1
            continue
        elif state == 1 and isinstance(x, signal):
            state = 0
        yield x, state

for x, state in generator_with_state(y):
    if state == 1:
        foo(x)
    else:
        bar(x)

Upvotes: 2

DeepSpace
DeepSpace

Reputation: 81594

No, it's not safe (as in, we won't get the outcome that we might have expected).

Consider this:

a = (_ for _ in range(20))
for num in a:
    print(num)

Of course, we will get 0 to 19 printed.

Now let's add a bit of code:

a = (_ for _ in range(20))
for num in a:
    for another_num in a:
        pass
    print(num)

The only thing that will be printed is 0. By the time that we get to the second iteration of the outer loop, the generator will already be exhausted by the inner loop.

We can also do this:

a = (_ for _ in range(20))
for num in a:
    for another_num in a:
        print(another_num)

If it was safe we would expect to get 0 to 19 printed 20 times, but we actually get it printed only once, for the same reason I mentioned above.

Upvotes: 3

Related Questions