mike jones
mike jones

Reputation: 1

Performance of two operations to get the sum of the first n natural numbers

I was working on Project Euler problem 12 and came up with the following solution:

import math

def main():
    num = 0
    j = 0
    numFactors = 0
    while numFactors <= 500:
        j += 1


        num = num + j #num = sum of numbers from 1 to j
        #num = (j *(j+1))//2


        numFactors = 0
        for n in range(1,int(math.sqrt(num))):
            if num % n == 0:
                numFactors += 2

    print(num)

if __name__ == '__main__':
    from timeit import Timer
    t = Timer(lambda: main())
    print(t.timeit(number=1))

My question is about the line num = num + j against num = (j *(j+1))//2.

When using just the single addition, the program is consistently slower than if I uncomment the next line and do a multiplication, an addition and a division.

Can anyone explain why the single operation is slower than doing 3?

Upvotes: 0

Views: 93

Answers (1)

alexwlchan
alexwlchan

Reputation: 6098

I don’t actually see any difference in the performance on my machine. I would also be moderately surprised if the multiplication actually was faster than the addition.

We can use the dis module to disassemble the byte code from the two variants. Most of it is the same across two variants, so the performance in either function should be identical. These are the two sections that are different (line 11 is num = num + j, line 12 is num = (j *(j+1))//2):

 11          43 LOAD_FAST                0 (num)
             46 LOAD_FAST                1 (j)
             49 BINARY_ADD          
             50 STORE_FAST               0 (num)

 12          43 LOAD_FAST                1 (j)
             46 LOAD_FAST                1 (j)
             49 LOAD_CONST               3 (1)
             52 BINARY_ADD          
             53 BINARY_MULTIPLY     
             54 LOAD_CONST               4 (2)
             57 BINARY_FLOOR_DIVIDE 
             58 STORE_FAST               0 (num)

Three operations are present in both disassemblies. Here it is again, only showing the lines unique to each variant:

 11          43 LOAD_FAST                0 (num)

 12          43 LOAD_FAST                1 (j)
             49 LOAD_CONST               3 (1)
             53 BINARY_MULTIPLY     
             54 LOAD_CONST               4 (2)
             57 BINARY_FLOOR_DIVIDE

If the multiplication is truly faster than the addition, then I expect somebody well-versed in byte code could explain why the LOAD_FAST for num is faster than the five operations for line 12. But to my layman’s eye, I’d expect the variant with more bytecode operations to take longer.

Upvotes: 1

Related Questions