Reputation: 175
Not sure if title is correct terminology.
If you have to compare the characters in 2 strings (A,B) and count the number of matches of chars in B against A:
sum([ch in A for ch in B])
is faster on %timeit than
sum(ch in A for ch in B)
I understand that the first one will create a list of bool, and then sum the values of 1. The second one is a generator. I'm not clear on what it is doing internally and why it is slower?
Thanks.
Edit with %timeit results:
10 characters
generator expression
list
10000 loops, best of 3: 112 µs per loop
10000 loops, best of 3: 94.6 µs per loop
1000 characters
generator expression
list
100 loops, best of 3: 8.5 ms per loop
100 loops, best of 3: 6.9 ms per loop
10,000 characters
generator expression
list
10 loops, best of 3: 87.5 ms per loop
10 loops, best of 3: 76.1 ms per loop
100,000 characters
generator expression
list
1 loop, best of 3: 908 ms per loop
1 loop, best of 3: 840 ms per loop
Upvotes: 16
Views: 2943
Reputation: 5877
I took a look at the disassembly of each construct (using dis). I did this by declaring these two functions:
def list_comprehension():
return sum([ch in A for ch in B])
def generation_expression():
return sum(ch in A for ch in B)
and then calling dis.dis
with each function.
For the list comprehension:
0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
4 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (ch)
8 LOAD_FAST 1 (ch)
10 LOAD_GLOBAL 0 (A)
12 COMPARE_OP 6 (in)
14 LIST_APPEND 2
16 JUMP_ABSOLUTE 4
18 RETURN_VALUE
and for the generator expression:
0 LOAD_FAST 0 (.0)
2 FOR_ITER 14 (to 18)
4 STORE_FAST 1 (ch)
6 LOAD_FAST 1 (ch)
8 LOAD_GLOBAL 0 (A)
10 COMPARE_OP 6 (in)
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE 2
18 LOAD_CONST 0 (None)
20 RETURN_VALUE
The disassembly for the actual summation is:
0 LOAD_GLOBAL 0 (sum)
2 LOAD_CONST 1 (<code object <genexpr> at 0x7f49dc395240, file "/home/mishac/dev/python/kintsugi/KintsugiModels/automated_tests/a.py", line 12>)
4 LOAD_CONST 2 ('generation_expression.<locals>.<genexpr>')
6 MAKE_FUNCTION 0
8 LOAD_GLOBAL 1 (B)
10 GET_ITER
12 CALL_FUNCTION 1
14 CALL_FUNCTION 1
16 RETURN_VALUE
but this sum
disassembly was constant between both your examples, with the only difference being the loading of generation_expression.<locals>.<genexpr>
vs list_comprehension.<locals>.<listcomp>
(so just loading a different local variable).
The differing bytecode instructions between the first two disassemblies are LIST_APPEND
for the list comprehension vs. the conjunction of YIELD_VALUE
and POP_TOP
for the generator expression.
I won't pretend I know the intrinsics of Python bytecode, but what I gather from this is that the generator expression is implemented as a queue, where the value is generated and then popped. This popping doesn't have to happen in a list comprehension, leading me to believe there'll be a slight amount of overhead in using generators.
Now this doesn't mean that generators are always going to be slower. Generators excel at being memory-efficient, so there will be a threshold N such that list comprehensions will perform slightly better before this threshold (because memory use won't be a problem), but after this threshold, generators will significantly perform better.
Upvotes: 15
Reputation: 9197
Generators are usually slower than list comprehension, the whole point of generators is to be memory efficient because they yield each item by creating them in a lazy fashion (only when actually needed). They favor memory efficency over speed.
Upvotes: 1