Reputation: 12026
I always thought comparisons were the fastest operation a computer could execute. I remember hearing it on a presentation from D. Knuth where he'd write loops in descending order "because comparison against 0 is fast". I also read that multiplications should be slower than additions here.
I'm surprised to see that, in both Python 2 and 3, testing under both Linux and Mac, comparisons seem to be much slower than arithmetic operations.
Could anyone explain why?
%timeit 2 > 0
10000000 loops, best of 3: 41.5 ns per loop
%timeit 2 * 2
10000000 loops, best of 3: 27 ns per loop
%timeit 2 * 0
10000000 loops, best of 3: 27.7 ns per loop
%timeit True != False
10000000 loops, best of 3: 75 ns per loop
%timeit True and False
10000000 loops, best of 3: 58.8 ns per loop
And under python 3:
$ ipython3
Python 3.5.2 | packaged by conda-forge | (default, Sep 8 2016, 14:36:38)
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: %timeit 2 + 2
10000000 loops, best of 3: 22.9 ns per loop
In [2]: %timeit 2 * 2
10000000 loops, best of 3: 23.7 ns per loop
In [3]: %timeit 2 > 2
10000000 loops, best of 3: 45.5 ns per loop
In [4]: %timeit True and False
10000000 loops, best of 3: 62.8 ns per loop
In [5]: %timeit True != False
10000000 loops, best of 3: 92.9 ns per loop
Upvotes: 24
Views: 7212
Reputation: 14075
Wow, The answer by @mu 無 blew my mind! However, it is important not to generalize when deriving your conclusions... You are checking the times for CONSTANTS not variables. For variables, multiplication seems to be slower than comparison.
Here is the more interesting case, in which the numbers to be compared are stored in actual variables...
import timeit
def go():
number=1000000000
print
print 'a>b, internal:',timeit.timeit(setup="a=1;b=1", stmt="a>b", number=number)
print 'a*b, internal:',timeit.timeit(setup="a=1;b=1", stmt="a*b", number=number)
print 'a>b, shell :',
%%timeit -n 1000000000 "a=1;b=1" "a>b"
print 'a*b, shell :',
%%timeit -n 1000000000 "a=1;b=1" "a*b"
go()
The result gives:
a>b, internal: 51.9467676445
a*b, internal: 63.870462403
a>b, shell :1000000000 loops, best of 3: 19.8 ns per loop
a>b, shell :1000000000 loops, best of 3: 19.9 ns per loop
And order is restored in the universe ;)
For completeness, lets see some more cases... What about if we have one variable and one constant?
import timeit
def go():
print 'a>2, shell :',
%%timeit -n 10000000 "a=42" "a>2"
print 'a*2, shell :',
%%timeit -n 10000000 "a=42" "a*2"
go()
a>2, shell :10000000 loops, best of 3: 18.3 ns per loop
a*2, shell :10000000 loops, best of 3: 19.3 ns per loop
what happens with bools?
import timeit
def go():
print
number=1000000000
print 'a==b : ', timeit.timeit(setup="a=True;b=False",stmt="a==b",number=number)
print 'a and b : ', timeit.timeit(setup="a=True;b=False",stmt="a and b",number=number)
print 'boolean ==, shell :',
%%timeit -n 1000000000 "a=True;b=False" "a == b"
print 'boolean and, shell :',
%%timeit -n 1000000000 "a=False;b=False" "a and b"
go()
a==b : 70.8013108982
a and b : 38.0614485665
boolean ==, shell :1000000000 loops, best of 3: 17.7 ns per loop
boolean and, shell :1000000000 loops, best of 3: 16.4 ns per loop
:D Now this is interesting, it seems boolean and is faster than ==. However all this would be ok as the Donald Knuth would not loose his sleep, the best way to compare would be to use and...
In practice, we should check numpy, which may be even more significant...
import timeit
def go():
number=1000000 # change if you are in a hurry/ want to be more certain....
print '==== int ===='
print 'a>b : ', timeit.timeit(setup="a=1;b=2",stmt="a>b",number=number*100)
print 'a*b : ', timeit.timeit(setup="a=1;b=2",stmt="a*b",number=number*100)
setup = "import numpy as np;a=np.arange(0,100);b=np.arange(100,0,-1);"
print 'np: a>b : ', timeit.timeit(setup=setup,stmt="a>b",number=number)
print 'np: a*b : ', timeit.timeit(setup=setup,stmt="a*b",number=number)
print '==== float ===='
print 'float a>b : ', timeit.timeit(setup="a=1.1;b=2.3",stmt="a>b",number=number*100)
print 'float a*b : ', timeit.timeit(setup="a=1.1;b=2.3",stmt="a*b",number=number*100)
setup = "import numpy as np;a=np.arange(0,100,dtype=float);b=np.arange(100,0,-1,dtype=float);"
print 'np float a>b : ', timeit.timeit(setup=setup,stmt="a>b",number=number)
print 'np float a*b : ', timeit.timeit(setup=setup,stmt="a*b",number=number)
print '==== bool ===='
print 'a==b : ', timeit.timeit(setup="a=True;b=False",stmt="a==b",number=number*1000)
print 'a and b : ', timeit.timeit(setup="a=True;b=False",stmt="a and b",number=number*1000)
setup = "import numpy as np;a=np.arange(0,100)>50;b=np.arange(100,0,-1)>50;"
print 'np a == b : ', timeit.timeit(setup=setup,stmt="a == b",number=number)
print 'np a and b : ', timeit.timeit(setup=setup,stmt="np.logical_and(a,b)",number=number)
print 'np a == True : ', timeit.timeit(setup=setup,stmt="a == True",number=number)
print 'np a and True : ', timeit.timeit(setup=setup,stmt="np.logical_and(a,True)",number=number)
go()
==== int ====
a>b : 4.5121130192
a*b : 5.62955748632
np: a>b : 0.763992986986
np: a*b : 0.723006032235
==== float ====
float a>b : 6.39567713272
float a*b : 5.62149055215
np float a>b : 0.697037433398
np float a*b : 0.847941712765
==== bool ====
a==b : 6.91458288689
a and b : 3.6289697892
np a == b : 0.789666454087
np a and b : 0.724517620007
np a == True : 1.55066706189
np a and True : 1.44293071804
Again, same behavior... So I guess, one can benefit by using instead for == in general,
at least in Python 2 (Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]), where I tried all these...
Upvotes: 1
Reputation: 76887
This is happening due to Constant Folding in the Peep Hole optimizer within Python compiler.
Using the dis module, if we break each of the statements to look inside how they are being translated at machine level, you will observe that all operators like inequality, equality etc are first loaded into memory and then evaluated. However, all expressions like multiplication, addition etc are calculated and loaded as a constant into memory.
Overall, this leads to a lesser number of execution steps, making the steps faster:
>>> import dis
>>> def m1(): True != False
>>> dis.dis(m1)
1 0 LOAD_GLOBAL 0 (True)
3 LOAD_GLOBAL 1 (False)
6 COMPARE_OP 3 (!=)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> def m2(): 2 *2
>>> dis.dis(m2)
1 0 LOAD_CONST 2 (4)
3 POP_TOP
4 LOAD_CONST 0 (None)
7 RETURN_VALUE
>>> def m3(): 2*5
>>> dis.dis(m3)
1 0 LOAD_CONST 3 (10)
3 POP_TOP
4 LOAD_CONST 0 (None)
7 RETURN_VALUE
>>> def m4(): 2 > 0
>>> dis.dis(m4)
1 0 LOAD_CONST 1 (2)
3 LOAD_CONST 2 (0)
6 COMPARE_OP 4 (>)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> def m5(): True and False
>>> dis.dis(m5)
1 0 LOAD_GLOBAL 0 (True)
3 JUMP_IF_FALSE_OR_POP 9
6 LOAD_GLOBAL 1 (False)
>> 9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Upvotes: 22
Reputation: 17114
As others have explained, this is because Python's peephole optimiser optimises arithmetic operations but not comparisons.
Having written my own peephole optimiser for a Basic compiler, I can assure you that optimising constant comparisons is just as easy as optimising constant arithmetic operations. So there is no technical reason why Python should do the latter but not the former.
However, each such optimisation has to be separately programmed, and comes with two costs: the time to program it, and the extra optimising code taking up space in the Python executable. So you find yourself having to do some triage: which of these optimisations is common enough to make it worth the costs?
It seems that the Python implementers, reasonably enough, decided to optimise the arithmetic operations first. Perhaps they will get round to comparisons in a future release.
Upvotes: 9
Reputation: 9481
For python case the above answers are correct. For machine code things a bit more complicated. I assume we are talking about integer operations here, with floats and complex objects none of the below will apply. Also, we assume that the values you are comparing are already loaded into registers. If they are not fetching them from wherever they are could take 100 of times longer than the actual operations
Modern CPUs have several ways to compare two numbers. Very popular ones are doing XOR a,b if you just want to see if two values are equal or CMP a,b if you want to know the relationship between the values ( less, greater, equal, etc ). CMP operation is just a subtraction with the result thrown away because we are only interested in post-op flags.
Both of these operations are of depth 1, so they could be executed in a single CPU cycle. This is as fast as you can go. Multiplication is a form of repeated additions so the depth of the operation is usually equal to the size of your register. There are some optimizations that could be made to reduce the depth, but generally multiplication is one of the slower operations that CPU can perform.
However, multiplying by 0,1 or any power of 2 could be reduced to shift operation. Which is also depth one operation. So it takes the same time as comparing two numbers. Think about decimal system, you can multiply any number by 10, 100, 1000 by appending zeros at the end of the number. Any optimizing compiler will recognize this type of multiplication and use the most efficient operation for it. Modern CPUs are also pretty advanced, so they can perform same optimization in the hardware by counting how many bits are set in any of the operands. And if it's just one bit the operation will be reduced to the shift.
So in your case multiplying by 2 is as fast as comparing two numbers. As people above pointed out any optimizing compiler will see that you are multiplying two constants, so it will replace just replace the function with returning a constant.
Upvotes: 3
Reputation: 3116
Like others have commented - it is due to the peep hole optimizer which pre-computes the results of 2*3 (6). As the dis shows
0 LOAD_CONST 3 (6)
But try this - it will disable the optimizer from pre-computing the results
>>> def a(a, b):
... return a*b
...
>>> dis.dis(a)
2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 BINARY_MULTIPLY
7 RETURN_VALUE
>>> def c(a,b):
... return a<b
...
>>> dis.dis(c)
2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 COMPARE_OP 0 (<)
9 RETURN_VALUE
>>>
If you time these functions the compare will be faster.
Upvotes: 4
Reputation: 73460
A quick disassambling reveals that the comparison involves more operations. According to this answer, there is some precalculation done by the "peephole optimiser" (wiki) for multiplication, addition, etc., but not for the comparison operators:
>>> import dis
>>> def a():
... return 2*3
...
>>> dis.dis(a)
2 0 LOAD_CONST 3 (6)
3 RETURN_VALUE
>>> def b():
... return 2 < 3
...
>>> dis.dis(b)
2 0 LOAD_CONST 1 (2)
3 LOAD_CONST 2 (3)
6 COMPARE_OP 0 (<)
9 RETURN_VALUE
Upvotes: 4