Reputation: 2920
Consider the following list comprehension:
[s.strip() for s in ['a', 'b ', ' ', ' c'] if s.strip()]
Would s.strip()
be computed twice, or would Python optimise such expressions internally and compute duplicate expressions only once? I know that Python is not a compiled language, but such a simple optimisation could even be inferred from the AST.
Thanks in advance.
Upvotes: 1
Views: 62
Reputation: 8180
s.strip()
called twice?If you use CPython, yes. You can try the dis
module to cehck the bytecode.
def f():
return [s.strip() for s in ['a', 'b ', ' ', ' c'] if s.strip()]
With Python 3.7, you get the bytecode of the function and the list comprehension:
>>> import dis
>>> dis.dis(f)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x7f6ddc8ce8a0, file "temp.py", line 2>)
2 LOAD_CONST 2 ('f.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 3 (('a', 'b ', ' ', ' c'))
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x7f6ddc8ce8a0, file "temp.py", line 2>:
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 20 (to 26)
6 STORE_FAST 1 (s)
8 LOAD_FAST 1 (s)
10 LOAD_METHOD 0 (strip)
12 CALL_METHOD 0
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (s)
18 LOAD_METHOD 0 (strip)
20 CALL_METHOD 0
22 LIST_APPEND 2
24 JUMP_ABSOLUTE 4
>> 26 RETURN_VALUE
(With Python <= 3.6, you have to write dis.dis(f.__code__.co_consts[1])
to get the bytecode of the list comprehension.)
As you see, the method strip
is called twice (line 10-12 and 18-20).
such a simple optimisation could even be inferred from the AST.
Why do you expect s.strip()
to be computed once? Because you known that this function is pure and especially that if s == t
, then s.strip() == t.strip()
. But there is, to the extent of my knowledge, no such concept in Python. That means that the interpreter has no way to say that the result will be the same.
A little example with a non pure function (yes, this is ugly):
>>> i=0
>>> def mystrip(s):
... global i
... i+=1
... return s.strip() if i%2==0 else s
...
One call, mystrip
returns the stripped string, the other, it returns the string:
>>> mystrip('a ')
'a '
>>> mystrip('a ')
'a'
Hence the following result:
>>> [mystrip(s) for s in ['a', 'b ', ' ', ' c'] if mystrip(s)]
['a', 'b', '', 'c']
if
arguments are not stripped (hence True
because all the string have at least one char), but the returned values are stripped.
If you want s.strip()
to be evaluated once, you have to write:
>>> [t for t in (s.strip() for s in ['a', 'b ', ' ', ' c']) if t]
(The expression between parentheses is a generator.)
Other versions (see also @bruno desthuilliers comment under our question):
>>> [t for t in map(str.strip, ['a', 'b ', ' ', ' c']) if t]
>>> list(filter(None, (s.strip() for s in ['a', 'b ', ' ', ' c'])))
>>> filter(None, map(str.strip, ['a', 'b ', ' ', ' c']))
Upvotes: 2