Reputation: 154
I have a code that looks something like this:
def somefunction(somelist):
for item in somelist:
if len(item) > 10:
do something
elif len(item) > 6:
do something
elif len(item) > 3:
do something
else:
do something
Since I am calling len(item) multiple times, is it inefficient to do it this way? Would it be preferable to write the code as follows, or are they EXACTLY the same in performance?
def somefunction(somelist):
for item in somelist:
x = len(item)
if x > 10:
do something
elif x > 6:
do something
elif x > 3:
do something
else:
do something
Upvotes: 1
Views: 1407
Reputation: 250931
The second approach is surely better, as the number of calls to len()
are reduced:
In [16]: import dis
In [18]: lis=["a"*10000,"b"*10000,"c"*10000]*1000
In [19]: def first():
for item in lis:
if len(item)<100:
pass
elif 100<len(item)<200:
pass
elif 300<len(item)<400:
pass
....:
In [20]: def second():
for item in lis:
x=len(item)
if x<100:
pass
elif 100<x<200:
pass
elif 300<x<400:
pass
....:
You can always time your code using timeit
module:
In [21]: %timeit first()
100 loops, best of 3: 2.03 ms per loop
In [22]: %timeit second()
1000 loops, best of 3: 1.66 ms per loop
Use dis.dis()
to see disassembling of Python byte code into mnemonics
In [24]: dis.dis(first)
2 0 SETUP_LOOP 109 (to 112)
3 LOAD_GLOBAL 0 (lis)
6 GET_ITER
>> 7 FOR_ITER 101 (to 111)
10 STORE_FAST 0 (item)
3 13 LOAD_GLOBAL 1 (len)
16 LOAD_FAST 0 (item)
19 CALL_FUNCTION 1
22 LOAD_CONST 1 (100)
25 COMPARE_OP 0 (<)
28 POP_JUMP_IF_FALSE 34
4 31 JUMP_ABSOLUTE 7
5 >> 34 LOAD_CONST 1 (100)
37 LOAD_GLOBAL 1 (len)
40 LOAD_FAST 0 (item)
43 CALL_FUNCTION 1
46 DUP_TOP
47 ROT_THREE
48 COMPARE_OP 0 (<)
51 JUMP_IF_FALSE_OR_POP 63
54 LOAD_CONST 2 (200)
57 COMPARE_OP 0 (<)
60 JUMP_FORWARD 2 (to 65)
>> 63 ROT_TWO
64 POP_TOP
>> 65 POP_JUMP_IF_FALSE 71
6 68 JUMP_ABSOLUTE 7
7 >> 71 LOAD_CONST 3 (300)
74 LOAD_GLOBAL 1 (len)
77 LOAD_FAST 0 (item)
80 CALL_FUNCTION 1
83 DUP_TOP
84 ROT_THREE
85 COMPARE_OP 0 (<)
88 JUMP_IF_FALSE_OR_POP 100
91 LOAD_CONST 4 (400)
94 COMPARE_OP 0 (<)
97 JUMP_FORWARD 2 (to 102)
>> 100 ROT_TWO
101 POP_TOP
>> 102 POP_JUMP_IF_FALSE 7
8 105 JUMP_ABSOLUTE 7
108 JUMP_ABSOLUTE 7
>> 111 POP_BLOCK
>> 112 LOAD_CONST 0 (None)
115 RETURN_VALUE
In [25]: dis.dis(second)
2 0 SETUP_LOOP 103 (to 106)
3 LOAD_GLOBAL 0 (lis)
6 GET_ITER
>> 7 FOR_ITER 95 (to 105)
10 STORE_FAST 0 (item)
3 13 LOAD_GLOBAL 1 (len)
16 LOAD_FAST 0 (item)
19 CALL_FUNCTION 1
22 STORE_FAST 1 (x)
4 25 LOAD_FAST 1 (x)
28 LOAD_CONST 1 (100)
31 COMPARE_OP 0 (<)
34 POP_JUMP_IF_FALSE 40
5 37 JUMP_ABSOLUTE 7
6 >> 40 LOAD_CONST 1 (100)
43 LOAD_FAST 1 (x)
46 DUP_TOP
47 ROT_THREE
48 COMPARE_OP 0 (<)
51 JUMP_IF_FALSE_OR_POP 63
54 LOAD_CONST 2 (200)
57 COMPARE_OP 0 (<)
60 JUMP_FORWARD 2 (to 65)
>> 63 ROT_TWO
64 POP_TOP
>> 65 POP_JUMP_IF_FALSE 71
7 68 JUMP_ABSOLUTE 7
8 >> 71 LOAD_CONST 3 (300)
74 LOAD_FAST 1 (x)
77 DUP_TOP
78 ROT_THREE
79 COMPARE_OP 0 (<)
82 JUMP_IF_FALSE_OR_POP 94
85 LOAD_CONST 4 (400)
88 COMPARE_OP 0 (<)
91 JUMP_FORWARD 2 (to 96)
>> 94 ROT_TWO
95 POP_TOP
>> 96 POP_JUMP_IF_FALSE 7
9 99 JUMP_ABSOLUTE 7
102 JUMP_ABSOLUTE 7
>> 105 POP_BLOCK
>> 106 LOAD_CONST 0 (None)
109 RETURN_VALUE
Upvotes: 2
Reputation: 88987
Python does not make the two equivalent. The reason being that the two are not equivalent for an arbitrary function. Let's consider this function, x()
:
y = 1
def x():
return 1
And these two tests:
>>> print(x() + y)
2
>>> print(x() + y)
2
And:
>>> hw = x()
>>> print(hw + y)
2
>>> print(hw + y)
2
These are exactly the same, however, what if our function has side effects?
y = 1
def x():
global y
y += 1
return 1
The first case:
>>> print(x() + y)
3
>>> print(x() + y)
4
The second case:
>>> hw = x()
>>> print(hw + y)
3
>>> print(hw + y)
3
You can see that this optimization only works if the function has no side-effects, otherwise it can alter the program. As Python can't tell if a function has side-effects, it can't do this optimization.
As such, it makes sense to store the value locally and use it repeatedly, rather than calling the function again and again, although the reality is it is highly unlikely to matter as the difference will be tiny. That said, it's also much more readable and means you don't have to repeat yourself a lot, so it's generally a good idea to code that way.
Upvotes: 1
Reputation: 7099
You can check such things with dis.dis
:
import dis
def somefunction1(item):
if len(item) > 10:
print 1
elif len(item) > 10:
print 2
def somefunction2(item):
x = len(item)
if x > 10:
print 1
elif x > 10:
print 2
print "#1"
dis.dis(somefunction1)
print "#2"
dis.dis(somefunction2)
Interpreting the output:
#1
4 0 LOAD_GLOBAL 0 (len)
3 LOAD_FAST 0 (item)
6 CALL_FUNCTION 1
9 LOAD_CONST 1 (10)
12 COMPARE_OP 4 (>)
15 POP_JUMP_IF_FALSE 26
[...]
6 >> 26 LOAD_GLOBAL 0 (len)
29 LOAD_FAST 0 (item)
32 CALL_FUNCTION 1
35 LOAD_CONST 1 (10)
38 COMPARE_OP 4 (>)
41 POP_JUMP_IF_FALSE 52
[...]
#2
10 0 LOAD_GLOBAL 0 (len)
3 LOAD_FAST 0 (item)
6 CALL_FUNCTION 1
9 STORE_FAST 1 (x)
11 12 LOAD_FAST 1 (x)
15 LOAD_CONST 1 (10)
18 COMPARE_OP 4 (>)
21 POP_JUMP_IF_FALSE 32
[...]
13 >> 32 LOAD_FAST 1 (x)
35 LOAD_CONST 1 (10)
38 COMPARE_OP 4 (>)
41 POP_JUMP_IF_FALSE 52
You can see that in the first example, len(item)
is called twice (see the two CALL_FUNCTION
statements?), whereas it is only called one in the second implementation.
This means that the rest of your question boils down to how len()
is implemented -- it is O(1) (ie. cheap) for e.g. lists, but especially for ones you might have built yourself, it need not be.
Upvotes: 1
Reputation: 5683
len() is O(1) operation. This mean the cost of calling len( ) is very cheap. So, stop worrying about it and better improve other part of your code.
However, personally, I think the second way is better. Because if I change your variable name from x
to length
, it will increase your code's readability.
def somefunction(somelist):
for item in somelist:
length = len(item)
if length > 10:
do something
elif length > 6:
do something
elif length > 3:
do something
else:
do something
NOTE: len( )
is O(1) with strings, sets, and dictionaries.
Upvotes: 2
Reputation: 39451
Python doesn't optimize things automatically like most other languages (unless you're using PyPy), so the second version is probably faster. But unless item
has a custom len
implementation that takes a while, it probably won't speed things up that much either. This is the sort of microoptimization that should be reserved for tight inner loops after profiling has indicated a problem.
Upvotes: 1