zvone
zvone

Reputation: 19392

Why is Python's built-in sum much slower than manual summation?

Here is my simple code example:

import time

t0 = time.time()
s = 0
for i in range(1000000):
    s += i
t1 = time.time()

print(s, t1 - t0)

t0 = time.time()
s = sum(i for i in range(1000000))
t1 = time.time()

print(s, t1 - t0)

On my computer (with Python 3.8) it prints:

499999500000 0.22901296615600586
499999500000 1.6930372714996338

So, doing += a million times is 7 times faster than calling sum? That is really unexpected. What is it doing?


Edit: I foolishly allowed a debugger to attach to the process and interfere with my measurements, which was in the end the cause of the slowness. With the debugger out, the measurements are no longer so unpredictable. As some of the answers are clearly showing, what I observed shuold not happen.

Upvotes: 6

Views: 349

Answers (4)

norok2
norok2

Reputation: 26956

First of all, probably your observation does not generalize well to other systems, as your way of measuring is quite unreliable because it is susceptible to fluctuation in performances that are bound to be dominated by how your OS reacts to the fluctuating system load at the time of measurement. You should use timeit or something similar.

For example, this are the timings I get on Python 3.6 in a virtual environment (Google Colab) (which seems to be quite reproducible across the other answers):

import numba as nb


def sum_loop(n):
    result = 0
    for x in range(n):
        result += x
    return result


sum_loop_nb = nb.jit(sum_loop)
sum_loop_nb.__name__ = 'sum_loop_nb'


def sum_analytical(n):
    return n * (n - 1) // 2


def sum_list(n):
    return sum([x for x in range(n)])


def sum_gen(n):
    return sum(x for x in range(n))


def sum_range(n):
    return sum(range(n))


sum_loop_nb(10)  # to trigger compilation


funcs = sum_analytical, sum_loop, sum_loop_nb, sum_gen, sum_list, sum_range


n = 1000000
for func in funcs:
    print(func.__name__, func(n))
    %timeit func(n)
# sum_analytical 499999500000
# 10000000 loops, best of 3: 222 ns per loop
# sum_loop 499999500000
# 10 loops, best of 3: 55.6 ms per loop
# sum_loop_nb 499999500000
# 10000000 loops, best of 3: 196 ns per loop
# sum_gen 499999500000
# 10 loops, best of 3: 51.7 ms per loop
# sum_list 499999500000
# 10 loops, best of 3: 68.4 ms per loop
# sum_range 499999500000
# 100 loops, best of 3: 17.8 ms per loop

It is unlikely that you will observe much different timings across different Python versions.

The sum_analytical() and sum_loop_nb() versions have been included just for fun and are not analyzed further. The sum_list() is also behaving quite differently from the rest, as it is creating a large, largely unnecessary, object for the computation, and it is also not analyzed further.


The reason for these different timings is in the bytecode produced by the considered versions of the functions, of course. In particular, from sum_loop() through sum_range() one gets progressively simpler code:

import dis

funcs = sum_loop, sum_gen, sum_range
for func in funcs:
    print(func.__name__)
    print(dis.dis(func))
    print()

# sum_loop
#   2           0 LOAD_CONST               1 (0)
#               2 STORE_FAST               1 (result)

#   3           4 SETUP_LOOP              24 (to 30)
#               6 LOAD_GLOBAL              0 (range)
#               8 LOAD_FAST                0 (n)
#              10 CALL_FUNCTION            1
#              12 GET_ITER
#         >>   14 FOR_ITER                12 (to 28)
#              16 STORE_FAST               2 (x)

#   4          18 LOAD_FAST                1 (result)
#              20 LOAD_FAST                2 (x)
#              22 INPLACE_ADD
#              24 STORE_FAST               1 (result)
#              26 JUMP_ABSOLUTE           14
#         >>   28 POP_BLOCK

#   5     >>   30 LOAD_FAST                1 (result)
#              32 RETURN_VALUE
# None

# sum_gen
#   9           0 LOAD_GLOBAL              0 (sum)
#               2 LOAD_CONST               1 (<code object <genexpr> at 0x7f86d67c49c0, file "<ipython-input-4-9519b0039c88>", line 9>)
#               4 LOAD_CONST               2 ('sum_gen.<locals>.<genexpr>')
#               6 MAKE_FUNCTION            0
#               8 LOAD_GLOBAL              1 (range)
#              10 LOAD_FAST                0 (n)
#              12 CALL_FUNCTION            1
#              14 GET_ITER
#              16 CALL_FUNCTION            1
#              18 CALL_FUNCTION            1
#              20 RETURN_VALUE
# None

# sum_range
#  13           0 LOAD_GLOBAL              0 (sum)
#               2 LOAD_GLOBAL              1 (range)
#               4 LOAD_FAST                0 (n)
#               6 CALL_FUNCTION            1
#               8 CALL_FUNCTION            1
#              10 RETURN_VALUE
# None

Upvotes: 2

AKX
AKX

Reputation: 169398

Let's use timeit for proper benchmarking and to make it easy to also compare different Python versions, let's run this in Docker containers:

so62514160.py

N = 1000000

def m1():
    s = 0
    for i in range(N):
        s += i

def m2():
    s = sum(i for i in range(N))

def m3():
    s = sum(range(N))

so62514160bench.sh

for image in python:2.7 python:3.6 python:3.7 python:3.8; do
    for fun in m1 m2 m3; do
        echo -n "$image" "$fun "
        docker run --rm -it -v $(pwd):/app -w /app -e PYTHONDONTWRITEBYTECODE=1 "$image" python -m timeit -s 'import so62514160 as s' "s.$fun()"
    done
done

results on my machine:

python:2.7 m1 10 loops, best of 3:  43.5 msec per loop
python:2.7 m2 10 loops, best of 3:  39.6 msec per loop
python:2.7 m3 100 loops, best of 3: 17.1 msec per loop
python:3.6 m1 10 loops, best of 3:  41.9 msec per loop
python:3.6 m2 10 loops, best of 3:  46 msec per loop
python:3.6 m3 100 loops, best of 3: 17.7 msec per loop
python:3.7 m1 5 loops, best of 5:   45 msec per loop
python:3.7 m2 5 loops, best of 5:   40.7 msec per loop
python:3.7 m3 20 loops, best of 5:  17.3 msec per loop
python:3.8 m1 5 loops, best of 5:   48.2 msec per loop
python:3.8 m2 5 loops, best of 5:   44.6 msec per loop
python:3.8 m3 10 loops, best of 5:  19.2 msec per loop

plot

enter image description here

Upvotes: 5

user13783164
user13783164

Reputation:

I get different numbers

python3 main.py
499999500000 0.0832064151763916
499999500000 0.03934478759765625

And from this code

import time

to0 = []
to1 = []

for i in range(1000):
    t0 = time.time()
    s = 0
    for i in range(1000000):
        s += i
    t1 = time.time()

    to0.append(t1 - t0)

    t0 = time.time()
    s = sum(i for i in range(1000000))
    t1 = time.time()

    to1.append(t1 - t0)


print(sum(to0)/len(to0))
print(sum(to1)/len(to1))

I get

0.07862246823310852
0.0318267240524292

Try updating your python all this is run on Python 3.7.3

Upvotes: 0

zvone
zvone

Reputation: 19392

Ah, I found an answer myself, but it bring up another question.

So, this is much faster:

t0 = time.time()
s = sum(range(1000000))
t1 = time.time()

print(s, t1 - t0)

The result is:

499999500000 0.05099987983703613

So, sum is faster than +=, as expected, but the generator expression (i for i in range(n)) is much slower than anything else.

I must say that this is also quite surprising.

Upvotes: 1

Related Questions