user6573621
user6573621

Reputation:

How does the Python for loop actually work?

I am curious to understand how Python for loops work under the hood. I tried to implement it somewhat like the following code snippet, is that how the for loop has been implemented?

my_list = [1, 2, 3, 4, 5]

# list itself is iterable but not iterator. Make it an iterator
iter_list = iter(my_list)

while True:
    try:
       print(next(iter_list))
    except StopIteration:
       break

Upvotes: 6

Views: 2338

Answers (2)

ShadowRanger
ShadowRanger

Reputation: 155584

As a supplement to what Martijn has already said, the closest you can get to a for loop implemented in Python, without using for, is converting:

for x in mylist:
    print(x)

to this:

NULL = object()  # We don't have real NULL from C, but we can simulate it as a guaranteed unique object; avoids invoking exception machinery in common case
iter_obj = iter(iterable)
while (x := next(iter_obj, NULL)) is not NULL:
    print(x)

The main differences from what you guessed are:

  1. If anything beyond the next call, assignment to the name, and the end of loop check would not be encompassed in the try block (if anything else raises StopIteration, it won't be caught by the for loop machinery)
  2. No break is used in the normal loop exit condition (important if an else: block is attached to the for; if you break, you skip it, if you exit normally, it's invoked, both in real for loops and this while simulation)

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1124238

Yes, that's a good approximation of how the for loop construct is implemented. It certainly matches the for loop statement documentation:

The expression list is evaluated once; it should yield an iterable object. An iterator is created for the result of the expression_list. The suite is then executed once for each item provided by the iterator, in the order returned by the iterator. Each item in turn is assigned to the target list using the standard rules for assignments (see Assignment statements), and then the suite is executed. When the items are exhausted (which is immediately when the sequence is empty or an iterator raises a StopIteration exception), the suite in the else clause, if present, is executed, and the loop terminates.

You only missed the assigned to the target list using the standard rules for assignments part; you'd have to use i = next(iter_list) and print(i) rather than print the result of the next() call directly.

Python source code is compiled to bytecode, which the interpreter loop then executes. You can look at the bytecode for a for loop by using the dis module:

>>> import dis
>>> dis.dis('for i in mylist: pass')
  1           0 SETUP_LOOP              12 (to 14)
              2 LOAD_NAME                0 (mylist)
              4 GET_ITER
        >>    6 FOR_ITER                 4 (to 12)
              8 STORE_NAME               1 (i)
             10 JUMP_ABSOLUTE            6
        >>   12 POP_BLOCK
        >>   14 LOAD_CONST               0 (None)
             16 RETURN_VALUE

The various opcodes named are documented in the same dis module, and their implementation can be found in the CPython evaluation loop (look for the TARGET(<opcode>) switch targets); the above opcodes break down to:

  • SETUP_LOOP 12 marks the start of the suite, a block of statements, so the interpreter knows where to jump to in case of a break, and what cleanup needs to be done in case of an exception or return statement; the clean-up opcode is located 12 bytes of bytecode after this opcode (so POP_BLOCK here).
  • LOAD_NAME 0 (mylist) loads the mylist variable value, putting it on the top of the stack (TOS in opcode descriptions).
  • GET_ITER calls iter() on the object on the TOS, then replaces the TOS with the result.
  • FOR_ITER 4 calls next() on the TOS iterator. If that gives a result, then that's pushed to the TOS. If there is a StopIteration exception, then the iterator is removed from TOS, and 4 bytes of bytecode are skipped to the POP_BLOCK opcode.
  • STORE_NAME 1 takes the TOS and puts it in the named variable, here that's i.
  • JUMP_ABSOLUTE 6 marks the end of the loop body; it tells the interpreter to go back up to bytecode offset 6, to the FOR_ITER instruction above. If we did something interesting in the loop, then that would happen after STORE_NAME, before the JUMP_ABSOLUTE.
  • POP_BLOCK removes the block bookkeeping set up by SETUP_LOOP and removes the iterator from the stack.

The >> markers are jump targets, there as visual cues to make it easier to spot those when reading the opcode line that jumps to them.

Upvotes: 9

Related Questions