tobias_t
tobias_t

Reputation: 25

Evaluation order of augmented operators (delimiters) in python

If I evaluate the following minimal example in python

a = [1, 2, 3]
a[-1] += a.pop()

I get

[1, 6]

So it seems that this is evaluated as

a[-1] = a[-1] + a.pop()

where each expression/operand would be evaluated in the order

third = first + second

so that on the lefthand side a[-1] is the 2nd element while on the righthand side it is the 3rd.

a[1] = a[2] + a.pop()

Can someone explain to me how one could infer this from the docs? Apparently '+=' is lexically a delimiter that also performs an operation (see here). What does that imply for its evaluation order?

EDIT:

I tried to clarify my question in a comment. I'll include it here for reference.

I want to understand if augmented operators have to be treated in a special way (i.e. by expanding them) during lexical analysis, because you kind of have to duplicate an expression and evaluate it twice. This is not clear in the docs and I want to know where this behaviour is specified. Other lexical delimiters (e.g. '}') behave differently.

Upvotes: 2

Views: 119

Answers (2)

ShadowRanger
ShadowRanger

Reputation: 155536

rici did a great job showing what's happening under the hood in the CPython reference interpreter, but there's a much simpler "source of truth" here in the language spec, which guarantees this behavior for any Python interpreter (not just CPython, but PyPy, Jython, IronPython, Cython, etc.). In the language spec, under Chapter 6: Expressions, section 6.16, Evaluation Order, it specifies:

Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side.

That second sentence sounds like an exception to the general rule, but it isn't; assignment with = (including augmented assignment with += or the like) is not an expression in Python (the walrus operator introduced in 3.8 is an expression, but it can only assign to bare names, so there is never anything to "evaluate" on the left side, it's purely storing there, never reading from it), it's a statement, and the assignment statement has its own rules for order of evaluation. Those rules for assignment specify:

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.

This confirms the second sentence from the Expression Evaluation Order documentation; the expression list (the thing to be assigned) is evaluated first, then assignments to the targets proceed from there. So by the language spec itself, a[-1] += a.pop() must completely evaluate a.pop() first (the "expression list"), then perform assignment.

This behavior is required by the language spec, and has been for some time, so it can be relied on no matter what Python interpreter you're using.

That said, I'd recommend against code that relies on these guarantees from Python. For one, when you switch to other languages, the rules differ (and in some cases, e.g. many similar cases in C and C++, varying by version of the standard, there are no "rules", and trying to mutate the same object in multiple parts of an expression produces undefined behavior), so growing to rely on Python's behavior will hamper your ability to use other languages. Beyond that, it's still going to be confusing as hell, and just slight changes will avoid the confusion, for example, in your case, changing:

a[-1] += a.pop()

to just:

x = a.pop()
a[-1] += x

which, while admittedly a two-liner and therefore inferior!!!, achieves the same result, with meaningless overhead, and greater clarity.

TL;DR: The Python language spec guarantees that the right-hand side of += is fully evaluated before the augmented assignment operation begins and any of the code on the left-hand side is evaluated. But for code clarity, any code that relies on that guarantee should probably be refactored to avoid said reliance.

Upvotes: 1

rici
rici

Reputation: 241891

Let me start with the question you asked at the end:

I want to understand if augmented operators have to be treated in a special way (i.e. by expanding them) during lexical analysis,

That one is simple; the answer is "no". A token is just a token and the lexical analyser just divides the input into tokens. As far as the lexical analyser is concerned, += is just a token, and that's what it returns for it.

By the way, the Python docs make a distinction between "operators" and "punctuation", but that's not really a significant difference for the current lexical analyser. It might have made sense in some previous incarnation of the parser based on operator-precedence parsing, in which an "operator" is a lexeme with associated precedence and associativity. But I don't know if Python ever used that particular parsing algorithm; in the current parser, both "operators" and "punctuation" are literal lexemes which appear as such in syntax rules. As you might expect, the lexical analyser is more concerned with the length of the tokens (<= and += are both two-character tokens) than with the eventual use inside the parser.

"Desugaring" -- the technical term for source tranformations which convert some language construct into a simpler construct -- is not usually performed either in the lexer or in the parser, although the internal workings of compilers are not subject to a Code of Conduct. Whether a language even has a desugaring component is generally considered an implementation detail, and may not be particularly visible; that's certainly true of Python. Python doesn't expose an interface to its tokeniser, either; the tokenizer module is a reimplementation in pure Python which does not produce exactly the same behaviour (although it's close enough to be a useful exploratory tool). But the parser is exposed in the ast module, which provides direct access to Python's own parser (at least in the CPython implementation), and that let's us see that no desugaring is done up to the point that the AST is constructed (note: requires Python3.9 for the indent option):

>>> import ast
>>> def showast(code):
...    print(ast.dump(ast.parse(code), indent=2))
...
>>> showast('a[-1] += a.pop()')
Module(
  body=[
    AugAssign(
      target=Subscript(
        value=Name(id='a', ctx=Load()),
        slice=UnaryOp(
          op=USub(),
          operand=Constant(value=1)),
        ctx=Store()),
      op=Add(),
      value=Call(
        func=Attribute(
          value=Name(id='a', ctx=Load()),
          attr='pop',
          ctx=Load()),
        args=[],
        keywords=[]))],
  type_ignores=[])

This produces exactly the syntax tree you would expect from the grammar, in which "augmented assignment" statements are represented as a specific production within assignment:

assignment:
    | single_target augassign ~ (yield_expr | star_expressions)

single_target is a single assignable expression (such as a variable or, as in this case, a subscripted array); augassign is one of the augmented assignment operators, and the rest are alternatives for the right-hand side of the assignment. (You can ignore the "fence" grammar operator ~.) The parse tree produced by ast.dump is pretty close to the grammar, and shows no desugaring at all:

                      
         --------------------------
         |         |              |
     Subscript    Add            Call
     ---------           -----------------
      |     |            |        |      |
      a    -1        Attribute   [ ]    [ ]
                      ---------
                       |     |
                       a   'pop'

The magic happens afterwards, which we can also see because the Python standard library also includes a disassembler:

>>> import dis
>>> dis.dis(compile('a[-1] += a.pop()', '--', 'exec'))
  1           0 LOAD_NAME                0 (a)
              2 LOAD_CONST               0 (-1)
              4 DUP_TOP_TWO
              6 BINARY_SUBSCR
              8 LOAD_NAME                0 (a)
             10 LOAD_METHOD              1 (pop)
             12 CALL_METHOD              0
             14 INPLACE_ADD
             16 ROT_THREE
             18 STORE_SUBSCR
             20 LOAD_CONST               1 (None)
             22 RETURN_VALUE

As can be seen, trying to summarize the evaluation order of augmented assignment as "left-to-right" is just an approximation. Here's what actually happens, as revealed in the virtual machine code above:

  1. The target aggregate and its index are "computed" (lines 0 and 2), and then these two values are duplicated (line 4). (The duplication means that neither the target nor its subscript are evaluated twice.)

  2. Then the duplicated values are used to lookup the value of the element (line 6). So it's at this point that the value of a[-1] is evaluated.

  3. The right-hand side expression (a.pop()) is then evaluated (lines 8 through 12).

  4. These two values (both 3, in this case) are combined with INPLACE_ADD because this is an ADD augmented assignment. In the case of integers, there's no difference between INPLACE_ADD and ADD, because integers are immutable values. But the compiler doesn't know that the first operand is an integer. a[-1] could be anything, including another list. So it emits an operand which will trigger the use of the __iadd__ method instead of __add__, in case there is a difference.

  5. The original target and subscript, which have been patiently waiting on the stack since step 1, are then used to perform a subscripted store (lines 16 and 18. The subscript is still the subscript computed at line 2, -1. But at this point a[-1] refers to a different element of a. The rotate is needed to get the arguments for into the correct order. Because the normal order of evaluation for assignment is to evaluate the right-hand side first, the virtual machine assumes that the new value will be at the bottom of the stack, followed by the object and its subscript.

  6. Finally, None is returned as the value of the statement.

The precise workings of assignment and augmented assignment statements are documented in the Python reference manual. Another important source of information is the description of the __iadd__ special method. Evaluation (and evaluation order) for augmented assignment operations is sufficiently confusing that there is a Programming FAQ dedicated to it, which is worth reading carefully if you want to understand the exact mechanism.

Interesting though that information is, it's worth adding that writing programs which depend on details of the evaluation order inside an augmented assignment is not conducive to producing readable code. In almost all cases, augmented assignment which relies on non-obvious details of the procedure should be avoided, including statements such as the one that is the target of this question.

Upvotes: 4

Related Questions