Why are these two loops the exact same speed?

Question

l = [1,0,0,1,1]

count = 0
start = time()
for _ in range(100000):
    for x in range(len(l)-1):
        for y in range(x+1, len(l)):
            if l[x] + l[y] == 1:
                count += 1
end = time()
count2 = 0
start2 = time()
for _ in range(100000):
    for x in range(len(l)-1):
        for y in range(x+1, len(l)):
            if l[x]^l[y]:
                count2 += 1
end2 = time()

print str(count) + ' : Add and compare took: ' + str((end - start)/100000)
print str(count2) + ' : Bitwise took: ' + str((end2 - start2)/100000)

From what I understood of bitwise operations, they were supposed to be faster than simple comparisons. Yet these two loops end up being the exact same speed (without being overly nitpicky).

What is the benefit of using, arguably more complicated, bitwise operations over integer comparison when integer comparison appears to be just as fast?

Edit:

It appears that opcodes are not all created equal.

Austin's answer mentioned that the difference between the two operation was 3 opcodes vs 1, however, the following example has the same number of opcodes, but significantly different performance:

i = j = 10

def test1():
    if i == j:
        print True

def test2():
    if not i-j:
        print True

print 'test 1'
start1 = time()
test1()
end1 = time()
dis(test1)
print 'test 2'
start2 = time()
test2()
end2 = time()
dis(test2)
print 'Test 1 took: ' + str(end1 - start1)
print 'Test 2 took: ' + str(end2 - start2)

This will output:

test 1
True
 25           0 LOAD_GLOBAL              0 (i)
              3 LOAD_GLOBAL              1 (j)
              6 COMPARE_OP               2 (==)
              9 POP_JUMP_IF_FALSE       20

 26          12 LOAD_GLOBAL              2 (True)
             15 PRINT_ITEM          
             16 PRINT_NEWLINE       
             17 JUMP_FORWARD             0 (to 20)
        >>   20 LOAD_CONST               0 (None)
             23 RETURN_VALUE        
test 2
True
 29           0 LOAD_GLOBAL              0 (i)
              3 LOAD_GLOBAL              1 (j)
              6 BINARY_SUBTRACT     
              7 POP_JUMP_IF_TRUE        18

 30          10 LOAD_GLOBAL              2 (True)
             13 PRINT_ITEM          
             14 PRINT_NEWLINE       
             15 JUMP_FORWARD             0 (to 18)
        >>   18 LOAD_CONST               0 (None)
             21 RETURN_VALUE        
Test 1 took: 7.86781311035e-06
Test 2 took: 5.00679016113e-06

Is there a more accurate way of measuring efficiency?

Edit:

Opcodes are created almost equal.

Modifying the code to exclude nasty I/O shows why I/O is problematic.

i = j = 10
bool1 = False
bool2 = False

def test1():
    if i == j:
        bool1 = True

def test2():
    if not i-j:
        bool2 = True

print 'test 1'
start1 = time()
for _ in range(1000000):
    test1()
end1 = time()
dis(test1)
print 'test 2'
start2 = time()
for _ in range(1000000):
    test2()
end2 = time()
dis(test2)
print str(bool1) + ' : Test 1 took: ' + str(end1 - start1)
print str(bool2) + ' : Test 2 took: ' + str(end2 - start2)

Will print:

test 1
 27           0 LOAD_GLOBAL              0 (i)
              3 LOAD_GLOBAL              1 (j)
              6 COMPARE_OP               2 (==)
              9 POP_JUMP_IF_FALSE       21

 28          12 LOAD_GLOBAL              2 (True)
             15 STORE_FAST               0 (bool1)
             18 JUMP_FORWARD             0 (to 21)
        >>   21 LOAD_CONST               0 (None)
             24 RETURN_VALUE        
test 2
 31           0 LOAD_GLOBAL              0 (i)
              3 LOAD_GLOBAL              1 (j)
              6 BINARY_SUBTRACT     
              7 POP_JUMP_IF_TRUE        19

 32          10 LOAD_GLOBAL              2 (True)
             13 STORE_FAST               0 (bool2)
             16 JUMP_FORWARD             0 (to 19)
        >>   19 LOAD_CONST               0 (None)
             22 RETURN_VALUE        
False : Test 1 took: 0.156816959381
False : Test 2 took: 0.16281914711

So not as drastic, but still very slightly different. This was run ~12 times with Test 1 only taking longer once.

So there is still some mystery! Only not as drastic.

aghast · Accepted Answer

All the code in your loops are basically the same, so I eliminated them. Instead, I reduced your code to two functions, and asked my good friend dis.dis to show me what they were doing:

l = [1,0,0,1,1]

def f1():
    x = y = 0
    if l[x] + l[y] == 1:
        count += 1

def f2():
    x = y = 0
    if l[x]^l[y]:
        count2 += 1


import dis
print "F1"
dis.dis(f1)
print "F2"
dis.dis(f2)

Here's the output:

$ python2.7 test.py
F1
  4           0 LOAD_CONST               1 (0)
              3 DUP_TOP
              4 STORE_FAST               0 (x)
              7 STORE_FAST               1 (y)

  5          10 LOAD_GLOBAL              0 (l)
             13 LOAD_FAST                0 (x)
             16 BINARY_SUBSCR
             17 LOAD_GLOBAL              0 (l)
             20 LOAD_FAST                1 (y)
             23 BINARY_SUBSCR
             24 BINARY_ADD                          ## Here.
             25 LOAD_CONST               2 (1)      ## Here.
             28 COMPARE_OP               2 (==)     ## Here.
             31 POP_JUMP_IF_FALSE       47

  6          34 LOAD_FAST                2 (count)
             37 LOAD_CONST               2 (1)
             40 INPLACE_ADD
             41 STORE_FAST               2 (count)
             44 JUMP_FORWARD             0 (to 47)
        >>   47 LOAD_CONST               0 (None)
             50 RETURN_VALUE
F2
  9           0 LOAD_CONST               1 (0)
              3 DUP_TOP
              4 STORE_FAST               0 (x)
              7 STORE_FAST               1 (y)

 10          10 LOAD_GLOBAL              0 (l)
             13 LOAD_FAST                0 (x)
             16 BINARY_SUBSCR
             17 LOAD_GLOBAL              0 (l)
             20 LOAD_FAST                1 (y)
             23 BINARY_SUBSCR
             24 BINARY_XOR                          ## Here.
             25 POP_JUMP_IF_FALSE       41

 11          28 LOAD_FAST                2 (count2)
             31 LOAD_CONST               2 (1)
             34 INPLACE_ADD
             35 STORE_FAST               2 (count2)
             38 JUMP_FORWARD             0 (to 41)
        >>   41 LOAD_CONST               0 (None)
             44 RETURN_VALUE

The difference is 3 opcodes versus 1. And the setup for those operations is 6 opcodes. The difference is lost in the noise.

Why are these two loops the exact same speed?

Answers (1)

Related Questions