Reputation: 440
I'm trying to use the timeit module in Python (EDIT: We are using Python 3) to decide between a couple of different code flows. In our code, we have a series of if-statements that test for the existence of a character code in a string, and if it's there replace it like this:
if "<substring>" in str_var:
str_var = str_var.replace("<substring>", "<new_substring>")
We do this a number of times for different substrings. We're debating between that and using just the replace like this:
str_var = str_var.replace("<substring>", "<new_substring>")
We tried to use timeit to determine which one was faster. If the first code-block above is "stmt1" and the second is "stmt2", and our setup string looks like
str_var = '<string><substring><more_string>',
our timeit statements will look like this:
timeit.timeit(stmt=stmt1, setup=setup)
and
timeit.timeit(stmt=stmt2, setup=setup)
Now, running it just like that, on 2 of our laptops (same hardware, similar processing load) stmt1 (the statement with the if-statement) runs faster even after multiple runs (3-4 hundredths of a second vs. about a quarter of a second for stmt2).
However, if we define functions to do both things (including the setup creating the variable) like so:
def foo():
str_var = '<string><substring><more_string>'
if "<substring>" in str_var:
str_var = str_var.replace("<substring>", "<new_substring>")
and
def foo2():
str_var = '<string><substring><more_string>'
str_var = str_var.replace("<substring>", "<new_substring>")
and run timeit like:
timeit.timeit("foo()", setup="from __main__ import foo")
timeit.timeit("foo2()", setup="from __main__ import foo2")
the statement without the if-statement (foo2) runs faster, contradicting the non-functioned results.
Are we missing something about how Timeit works? Or how Python handles a case like this?
edit here is our actual code:
>>> def foo():
s = "hi 1 2 3"
s = s.replace('1','5')
>>> def foo2():
s = "hi 1 2 3"
if '1' in s:
s = s.replace('1','5')
>>> timeit.timeit(foo, "from __main__ import foo")
0.4094226634183542
>>> timeit.timeit(foo2, "from __main__ import foo2")
0.4815539780738618
vs this code:
>>> timeit.timeit("""s = s.replace("1","5")""", setup="s = 'hi 1 2 3'")
0.18738432400277816
>>> timeit.timeit("""if '1' in s: s = s.replace('1','5')""", setup="s = 'hi 1 2 3'")
0.02985000199987553
Upvotes: 6
Views: 755
Reputation: 29974
I think I've got it.
Look at this code:
timeit.timeit("""if '1' in s: s = s.replace('1','5')""", setup="s = 'hi 1 2 3'")
In this code, setup
is run exactly once. That means that s
becomes a "global". As a result, it gets modified to hi 5 2 3
in the first iteration and in
now returns False
for all successive iterations.
See this code:
timeit.timeit("""if '1' in s: s = s.replace('1','5'); print(s)""", setup="s = 'hi 1 2 3'")
This will print out hi 5 2 3
a single time because the print
is part of the if
statement. Contrast this, which will fill up your screen with a ton of hi 5 2 3
s:
timeit.timeit("""s = s.replace("1","5"); print(s)""", setup="s = 'hi 1 2 3'")
So the problem here is that the non-function with if
test is flawed and is giving you false timings, unless repeated calls on an already processed string is what you were trying to test. (If it is what you were trying to test, your function versions are flawed.) The reason the function with if
doesn't fair better is because it's running the replace
on a fresh copy of the string for each iteration.
The following test does what I believe you intended since it doesn't re-assign the result of the replace
back to s
, leaving it unmodified for each iteration:
>>> timeit.timeit("""if '1' in s: s.replace('1','5')""", setup="s = 'hi 1 2 3'"
0.3221409016812231
>>> timeit.timeit("""s.replace('1','5')""", setup="s = 'hi 1 2 3'")
0.28558505721252914
This change adds a lot of time to the if
test and adds a little bit of time to the non-if
test for me, but I'm using Python 2.7. If the Python 3 results are consistent, though, these results suggest that in
saves a lot of time when the strings rarely need any replacing. If they usually do require replacement, it appears in
costs a little bit of time.
Upvotes: 5
Reputation: 77494
Made even weirder by looking at the disassembled code. The second block has the if
version (which clocks in faster for me using timeit
just as in the OP's example).
Yet, by looking at the op codes, it purely appears to have 7 extra op codes, starting with the first BUILD_MAP
and also involving one extra POP_JUMP_IF_TRUE
(presumably for the if
statement check itself). Before and after that, all codes are the same.
This would suggest that building and performing the check in the if
statement somehow reduces the computation time for then checking within the call to replace
. How can we see specific timing information for the different op codes?
In [55]: dis.disassemble_string("s='HI 1 2 3'; s = s.replace('1','4')")
0 POP_JUMP_IF_TRUE 10045
3 PRINT_NEWLINE
4 PRINT_ITEM_TO
5 SLICE+2
6 <49>
7 SLICE+2
8 DELETE_SLICE+0
9 SLICE+2
10 DELETE_SLICE+1
11 <39>
12 INPLACE_MODULO
13 SLICE+2
14 POP_JUMP_IF_TRUE 15648
17 SLICE+2
18 POP_JUMP_IF_TRUE 29230
21 LOAD_NAME 27760 (27760)
24 STORE_GLOBAL 25955 (25955)
27 STORE_SLICE+0
28 <39>
29 <49>
30 <39>
31 <44>
32 <39>
33 DELETE_SLICE+2
34 <39>
35 STORE_SLICE+1
In [56]: dis.disassemble_string("s='HI 1 2 3'; if '1' in s: s = s.replace('1','4')")
0 POP_JUMP_IF_TRUE 10045
3 PRINT_NEWLINE
4 PRINT_ITEM_TO
5 SLICE+2
6 <49>
7 SLICE+2
8 DELETE_SLICE+0
9 SLICE+2
10 DELETE_SLICE+1
11 <39>
12 INPLACE_MODULO
13 SLICE+2
14 BUILD_MAP 8294
17 <39>
18 <49>
19 <39>
20 SLICE+2
21 BUILD_MAP 8302
24 POP_JUMP_IF_TRUE 8250
27 POP_JUMP_IF_TRUE 15648
30 SLICE+2
31 POP_JUMP_IF_TRUE 29230
34 LOAD_NAME 27760 (27760)
37 STORE_GLOBAL 25955 (25955)
40 STORE_SLICE+0
41 <39>
42 <49>
43 <39>
44 <44>
45 <39>
46 DELETE_SLICE+2
47 <39>
48 STORE_SLICE+1
Upvotes: 0