Shredderroy
Shredderroy

Reputation: 2920

Does Python optimise away duplicate expressions in list comprehensions?

Consider the following list comprehension:

[s.strip() for s in ['a', 'b ', ' ', ' c'] if s.strip()]

Would s.strip() be computed twice, or would Python optimise such expressions internally and compute duplicate expressions only once? I know that Python is not a compiled language, but such a simple optimisation could even be inferred from the AST.

Thanks in advance.

Upvotes: 1

Views: 62

Answers (1)

jferard
jferard

Reputation: 8180

Is s.strip() called twice?

If you use CPython, yes. You can try the dis module to cehck the bytecode.

def f():
    return [s.strip() for s in ['a', 'b ', ' ', ' c'] if s.strip()]

With Python 3.7, you get the bytecode of the function and the list comprehension:

>>> import dis
>>> dis.dis(f)

  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f6ddc8ce8a0, file "temp.py", line 2>)
              2 LOAD_CONST               2 ('f.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               3 (('a', 'b ', ' ', ' c'))
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f6ddc8ce8a0, file "temp.py", line 2>:
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                20 (to 26)
              6 STORE_FAST               1 (s)
              8 LOAD_FAST                1 (s)
             10 LOAD_METHOD              0 (strip)
             12 CALL_METHOD              0
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (s)
             18 LOAD_METHOD              0 (strip)
             20 CALL_METHOD              0
             22 LIST_APPEND              2
             24 JUMP_ABSOLUTE            4
        >>   26 RETURN_VALUE

(With Python <= 3.6, you have to write dis.dis(f.__code__.co_consts[1]) to get the bytecode of the list comprehension.)

As you see, the method strip is called twice (line 10-12 and 18-20).

Why Cpython does not make this optimisation?

such a simple optimisation could even be inferred from the AST.

Why do you expect s.strip() to be computed once? Because you known that this function is pure and especially that if s == t, then s.strip() == t.strip(). But there is, to the extent of my knowledge, no such concept in Python. That means that the interpreter has no way to say that the result will be the same.

A little example with a non pure function (yes, this is ugly):

>>> i=0
>>> def mystrip(s):
...     global i
...     i+=1
...     return s.strip() if i%2==0 else s
...

One call, mystrip returns the stripped string, the other, it returns the string:

>>> mystrip('a ')
'a '
>>> mystrip('a ')
'a'

Hence the following result:

>>> [mystrip(s) for s in ['a', 'b ', ' ', ' c'] if mystrip(s)]
['a', 'b', '', 'c']

if arguments are not stripped (hence True because all the string have at least one char), but the returned values are stripped.

Manual optimisation

If you want s.strip() to be evaluated once, you have to write:

>>> [t for t in (s.strip() for s in ['a', 'b ', ' ', ' c']) if t]

(The expression between parentheses is a generator.)

Other versions (see also @bruno desthuilliers comment under our question):

>>> [t for t in map(str.strip, ['a', 'b ', ' ', ' c']) if t]
>>> list(filter(None, (s.strip() for s in ['a', 'b ', ' ', ' c'])))
>>> filter(None, map(str.strip, ['a', 'b ', ' ', ' c']))

Upvotes: 2

Related Questions