Reputation: 315
What is the reason that casting an integer to a float is slower than adding 0.0 to that int in Python?
import timeit
def add_simple():
for i in range(1000):
a = 1 + 0.0
def cast_simple():
for i in range(1000):
a = float(1)
def add_total():
total = 0
for i in range(1000):
total += 1 + 0.0
def cast_total():
total = 0
for i in range(1000):
total += float(1)
print "Add simple timing: %s" % timeit.timeit(add_simple, number=1)
print "Cast simple timing: %s" % timeit.timeit(cast_simple, number=1)
print "Add total timing: %s" % timeit.timeit(add_total, number=1)
print "Cast total timing: %s" % timeit.timeit(cast_total, number=1)
The output of which is:
Add simple timing: 0.0001220703125
Cast simple timing: 0.000469923019409
Add total timing: 0.000164985656738
Cast total timing: 0.00040078163147
Upvotes: 19
Views: 2282
Reputation: 531135
Simply speaking, you aren't casting anything. A type cast tells the compiler to treat the value in a variable as if it had a different type; the same underlying bits are used. Python's float(1)
, however, constructs a new object in memory distinct from the argument to float
.
When you add 1 + 0.0
, this simply calls (1).__add__(0.0)
, and the __add__
method of the builtin int
class knows how to deal with float
objects. No additional objects (aside from the return value) need to be constructed.
Newer versions of Python have an optimizer so that constant expressions like 1 + 0.0
can be replaced at compile time with 1.0
; no functions need to be executed at run-time at all. Replace 1 + 0.0
with x = 1
(prior to the loop) and x + 0.0
to force int.__add__
to be called at runtime to observe the difference. It will be slower than 1 + 0.0
, but still faster than float(1)
.
Upvotes: 2
Reputation: 280564
If you look at the bytecode for add_simple
:
>>> dis.dis(add_simple)
2 0 SETUP_LOOP 26 (to 29)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 12 (to 28)
16 STORE_FAST 0 (i)
3 19 LOAD_CONST 4 (1.0)
22 STORE_FAST 1 (a)
25 JUMP_ABSOLUTE 13
>> 28 POP_BLOCK
>> 29 LOAD_CONST 0 (None)
32 RETURN_VALUE
You'll see that 0.0
isn't actually anywhere in there. It just loads the constant 1.0
and stores it to a
. Python computed the result at compile-time, so you're not actually timing the addition.
If you use a variable for 1
, so Python's primitive peephole optimizer can't do the addition at compile-time, adding 0.0
still has a lead:
>>> timeit.timeit('float(a)', 'a=1')
0.22538208961486816
>>> timeit.timeit('a+0.0', 'a=1')
0.13347005844116211
Calling float
requires two dict lookups to figure out what float
is, one in the module's global namespace and one in the built-ins. It also has Python function call overhead, which is more expensive than a C function call.
Adding 0.0
only requires indexing into the function's code object's co_consts
to load the constant 0.0
, and then calling the C-level nb_add
functions of the int
and float
types to perform the addition. This is a lower amount of overhead overall.
Upvotes: 11
Reputation: 51807
If you use the dis
module, you can start to see why:
In [11]: dis.dis(add_simple)
2 0 SETUP_LOOP 26 (to 29)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1000)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 12 (to 28)
16 STORE_FAST 0 (i)
3 19 LOAD_CONST 4 (1.0)
22 STORE_FAST 1 (a)
25 JUMP_ABSOLUTE 13
>> 28 POP_BLOCK
>> 29 LOAD_CONST 0 (None)
32 RETURN_VALUE
In [12]: dis.dis(cast_simple)
2 0 SETUP_LOOP 32 (to 35)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1000)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 18 (to 34)
16 STORE_FAST 0 (i)
3 19 LOAD_GLOBAL 1 (float)
22 LOAD_CONST 2 (1)
25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
28 STORE_FAST 1 (a)
31 JUMP_ABSOLUTE 13
>> 34 POP_BLOCK
>> 35 LOAD_CONST 0 (None)
38 RETURN_VALUE
Note the CALL_FUNCTION
Function calls in Python are (relatively) slow. As are .
lookups. Because casting to float
requires a function call - that's why it's slower.
Upvotes: 16
Reputation: 6632
If you're using Python 3 or a recent version of Python 2 (2.5 or higher), it does constant folding at bytecode generation time. This means that 1 + 0.0
is replaced with 1.0
before the code is executed.
Upvotes: 7
Reputation: 301
Doing the addition can be done in C. Doing the cast causes a function to be called and then the C lib to kick in. There's overhead for that function call.
Upvotes: 3