Reputation: 19392
Here is my simple code example:
import time
t0 = time.time()
s = 0
for i in range(1000000):
s += i
t1 = time.time()
print(s, t1 - t0)
t0 = time.time()
s = sum(i for i in range(1000000))
t1 = time.time()
print(s, t1 - t0)
On my computer (with Python 3.8) it prints:
499999500000 0.22901296615600586
499999500000 1.6930372714996338
So, doing +=
a million times is 7 times faster than calling sum
? That is really unexpected. What is it doing?
Edit: I foolishly allowed a debugger to attach to the process and interfere with my measurements, which was in the end the cause of the slowness. With the debugger out, the measurements are no longer so unpredictable. As some of the answers are clearly showing, what I observed shuold not happen.
Upvotes: 6
Views: 349
Reputation: 26956
First of all, probably your observation does not generalize well to other systems, as your way of measuring is quite unreliable because it is susceptible to fluctuation in performances that are bound to be dominated by how your OS reacts to the fluctuating system load at the time of measurement.
You should use timeit
or something similar.
For example, this are the timings I get on Python 3.6 in a virtual environment (Google Colab) (which seems to be quite reproducible across the other answers):
import numba as nb
def sum_loop(n):
result = 0
for x in range(n):
result += x
return result
sum_loop_nb = nb.jit(sum_loop)
sum_loop_nb.__name__ = 'sum_loop_nb'
def sum_analytical(n):
return n * (n - 1) // 2
def sum_list(n):
return sum([x for x in range(n)])
def sum_gen(n):
return sum(x for x in range(n))
def sum_range(n):
return sum(range(n))
sum_loop_nb(10) # to trigger compilation
funcs = sum_analytical, sum_loop, sum_loop_nb, sum_gen, sum_list, sum_range
n = 1000000
for func in funcs:
print(func.__name__, func(n))
%timeit func(n)
# sum_analytical 499999500000
# 10000000 loops, best of 3: 222 ns per loop
# sum_loop 499999500000
# 10 loops, best of 3: 55.6 ms per loop
# sum_loop_nb 499999500000
# 10000000 loops, best of 3: 196 ns per loop
# sum_gen 499999500000
# 10 loops, best of 3: 51.7 ms per loop
# sum_list 499999500000
# 10 loops, best of 3: 68.4 ms per loop
# sum_range 499999500000
# 100 loops, best of 3: 17.8 ms per loop
It is unlikely that you will observe much different timings across different Python versions.
The sum_analytical()
and sum_loop_nb()
versions have been included just for fun and are not analyzed further.
The sum_list()
is also behaving quite differently from the rest, as it is creating a large, largely unnecessary, object for the computation, and it is also not analyzed further.
The reason for these different timings is in the bytecode produced by the considered versions of the functions, of course. In particular, from sum_loop()
through sum_range()
one gets progressively simpler code:
import dis
funcs = sum_loop, sum_gen, sum_range
for func in funcs:
print(func.__name__)
print(dis.dis(func))
print()
# sum_loop
# 2 0 LOAD_CONST 1 (0)
# 2 STORE_FAST 1 (result)
# 3 4 SETUP_LOOP 24 (to 30)
# 6 LOAD_GLOBAL 0 (range)
# 8 LOAD_FAST 0 (n)
# 10 CALL_FUNCTION 1
# 12 GET_ITER
# >> 14 FOR_ITER 12 (to 28)
# 16 STORE_FAST 2 (x)
# 4 18 LOAD_FAST 1 (result)
# 20 LOAD_FAST 2 (x)
# 22 INPLACE_ADD
# 24 STORE_FAST 1 (result)
# 26 JUMP_ABSOLUTE 14
# >> 28 POP_BLOCK
# 5 >> 30 LOAD_FAST 1 (result)
# 32 RETURN_VALUE
# None
# sum_gen
# 9 0 LOAD_GLOBAL 0 (sum)
# 2 LOAD_CONST 1 (<code object <genexpr> at 0x7f86d67c49c0, file "<ipython-input-4-9519b0039c88>", line 9>)
# 4 LOAD_CONST 2 ('sum_gen.<locals>.<genexpr>')
# 6 MAKE_FUNCTION 0
# 8 LOAD_GLOBAL 1 (range)
# 10 LOAD_FAST 0 (n)
# 12 CALL_FUNCTION 1
# 14 GET_ITER
# 16 CALL_FUNCTION 1
# 18 CALL_FUNCTION 1
# 20 RETURN_VALUE
# None
# sum_range
# 13 0 LOAD_GLOBAL 0 (sum)
# 2 LOAD_GLOBAL 1 (range)
# 4 LOAD_FAST 0 (n)
# 6 CALL_FUNCTION 1
# 8 CALL_FUNCTION 1
# 10 RETURN_VALUE
# None
Upvotes: 2
Reputation: 169398
Let's use timeit
for proper benchmarking and to make it easy to also compare different Python versions, let's run this in Docker containers:
N = 1000000
def m1():
s = 0
for i in range(N):
s += i
def m2():
s = sum(i for i in range(N))
def m3():
s = sum(range(N))
for image in python:2.7 python:3.6 python:3.7 python:3.8; do
for fun in m1 m2 m3; do
echo -n "$image" "$fun "
docker run --rm -it -v $(pwd):/app -w /app -e PYTHONDONTWRITEBYTECODE=1 "$image" python -m timeit -s 'import so62514160 as s' "s.$fun()"
done
done
python:2.7 m1 10 loops, best of 3: 43.5 msec per loop
python:2.7 m2 10 loops, best of 3: 39.6 msec per loop
python:2.7 m3 100 loops, best of 3: 17.1 msec per loop
python:3.6 m1 10 loops, best of 3: 41.9 msec per loop
python:3.6 m2 10 loops, best of 3: 46 msec per loop
python:3.6 m3 100 loops, best of 3: 17.7 msec per loop
python:3.7 m1 5 loops, best of 5: 45 msec per loop
python:3.7 m2 5 loops, best of 5: 40.7 msec per loop
python:3.7 m3 20 loops, best of 5: 17.3 msec per loop
python:3.8 m1 5 loops, best of 5: 48.2 msec per loop
python:3.8 m2 5 loops, best of 5: 44.6 msec per loop
python:3.8 m3 10 loops, best of 5: 19.2 msec per loop
Upvotes: 5
Reputation:
I get different numbers
python3 main.py
499999500000 0.0832064151763916
499999500000 0.03934478759765625
And from this code
import time
to0 = []
to1 = []
for i in range(1000):
t0 = time.time()
s = 0
for i in range(1000000):
s += i
t1 = time.time()
to0.append(t1 - t0)
t0 = time.time()
s = sum(i for i in range(1000000))
t1 = time.time()
to1.append(t1 - t0)
print(sum(to0)/len(to0))
print(sum(to1)/len(to1))
I get
0.07862246823310852
0.0318267240524292
Try updating your python all this is run on Python 3.7.3
Upvotes: 0
Reputation: 19392
Ah, I found an answer myself, but it bring up another question.
So, this is much faster:
t0 = time.time()
s = sum(range(1000000))
t1 = time.time()
print(s, t1 - t0)
The result is:
499999500000 0.05099987983703613
So, sum
is faster than +=
, as expected, but the generator expression (i for i in range(n))
is much slower than anything else.
I must say that this is also quite surprising.
Upvotes: 1