Reputation: 27588
(Update: Might only happen in CPython 3.8 32-bit for Windows, so don't be surprised if you can't reproduce it in other versions. See tables in the Update section.)
Both iter
and reversed
result in specialized iterators for lists:
>>> iter([1, 2, 3])
<list_iterator object at 0x031495C8>
>>> reversed([1, 2, 3])
<list_reverseiterator object at 0x03168310>
But the reversed
one is much slower:
> python -m timeit -s "a = list(range(1000))" "list(iter(a))"
50000 loops, best of 5: 5.76 usec per loop
> python -m timeit -s "a = list(range(1000))" "list(reversed(a))"
20000 loops, best of 5: 14.2 usec per loop
And I can consistently reproduce it. I later tried iter
five more times, took 5.98, 5.84, 5.85, 5.87, 5.86. Then reversed
five more times, took 14.3, 14.4, 14.4, 14.5, 14.3.
I thought maybe iter
benefits from increasing memory locations of the list's elements, so I tried reversing the list beforehand. Same picture:
> python -m timeit -s "a = list(range(1000)); a.reverse()" "list(iter(a))"
50000 loops, best of 5: 5.73 usec per loop
> python -m timeit -s "a = list(range(1000)); a.reverse()" "list(reversed(a))"
20000 loops, best of 5: 14.1 usec per loop
Same picture with sum
as well:
> python -m timeit -s "a = list(range(1000))" "sum(iter(a))"
20000 loops, best of 5: 10.7 usec per loop
> python -m timeit -s "a = list(range(1000))" "sum(reversed(a))"
10000 loops, best of 5: 20.9 usec per loop
And with identical elements, too:
> python -m timeit -s "a = [None] * 1000" "list(iter(a))"
50000 loops, best of 5: 6.35 usec per loop
> python -m timeit -s "a = [None] * 1000" "list(reversed(a))"
20000 loops, best of 5: 14.5 usec per loop
Why is the reverse iterator so much slower?
I'm using CPython 3.8.1 32 bit on Windows 10 pro 64 bit version 1903 with an Intel i5-7200U (it's a HUAWEI MateBook X). No special configuration, just a normal Python install on a normal Windows install.
Update: I ran a larger automated test with eight different Python versions (all freshly installed with default settings) on another machine (Pentium N3700, Windows 10 Pro 64-bit 1903). Times in usec per loop:
32-bit 64-bit
CPython iter reversed iter reversed
3.5.4 16.6 17.0 15.2 16.2
3.6.8 16.8 17.2 14.9 15.8
3.7.6 16.5 16.9 14.8 15.5
3.8.1 16.3 22.1 14.6 15.5
Two things to note:
reversed
is the only one much slower. Might explain why almost nobody else could reproduce it.reversed
was a bit slower than iter
. About 0.4 usec in 32-bit and about 0.9 usec in 64-bit.I ran those 16 tests in Round-robin fashion for ten rounds, and each time shown above is the best of its ten source times. Each of the 160 source times was done like this:
python.exe -m timeit -r 5 -s "a = list(range(1000))" "list(iter(a))"
or
python.exe -m timeit -r 5 -s "a = list(range(1000))" "list(reversed(a))"
The times for each of the 16 tests were pretty consistent. Full table (note that the round-robin means I ran these column by column, not row by row):
3.5.4 32-bit iter [16.7, 16.6, 17.3, 16.6, 16.7, 16.6, 16.6, 16.6, 16.6, 16.7]
3.5.4 32-bit reversed [17.1, 17.1, 17.1, 17.2, 17.1, 17.1, 17.0, 17.1, 17.1, 17.1]
3.5.4 64-bit iter [15.2, 15.4, 15.4, 15.4, 15.4, 15.4, 15.4, 15.3, 15.4, 15.3]
3.5.4 64-bit reversed [16.8, 16.2, 16.3, 16.3, 16.2, 16.2, 16.2, 16.2, 16.2, 16.3]
3.6.8 32-bit iter [17.3, 16.9, 16.8, 16.9, 16.9, 16.8, 16.9, 16.9, 16.8, 16.8]
3.6.8 32-bit reversed [17.2, 17.2, 17.2, 17.3, 17.3, 17.3, 17.3, 17.2, 17.2, 17.2]
3.6.8 64-bit iter [15.0, 14.9, 15.9, 14.9, 14.9, 15.0, 14.9, 14.9, 14.9, 14.9]
3.6.8 64-bit reversed [15.8, 15.9, 16.4, 15.9, 15.9, 16.0, 15.8, 15.9, 15.9, 15.8]
3.7.6 32-bit iter [16.6, 17.2, 16.6, 16.5, 16.7, 16.7, 16.5, 16.5, 16.5, 16.7]
3.7.6 32-bit reversed [17.2, 17.6, 17.0, 17.0, 16.9, 17.2, 17.3, 17.0, 17.5, 17.0]
3.7.6 64-bit iter [14.8, 15.1, 14.9, 14.9, 14.8, 15.1, 14.9, 14.8, 15.0, 14.9]
3.7.6 64-bit reversed [16.0, 20.1, 15.7, 15.6, 15.6, 15.6, 15.7, 15.7, 15.8, 15.5]
3.8.1 32-bit iter [16.4, 16.6, 16.3, 16.4, 16.5, 16.4, 16.5, 16.4, 16.8, 16.4]
3.8.1 32-bit reversed [22.3, 22.4, 22.2, 22.3, 22.3, 22.3, 22.5, 22.4, 22.3, 22.1]
3.8.1 64-bit iter [14.6, 15.1, 14.6, 14.7, 14.7, 14.7, 14.7, 14.6, 14.6, 14.6]
3.8.1 64-bit reversed [15.5, 16.1, 15.5, 15.6, 15.5, 15.5, 15.5, 15.5, 15.5, 15.5]
The same test on a list with a million values (list(range(250)) * 4000
). Times are msec per loop:
32-bit 64-bit
CPython iter reversed iter reversed
3.5.4 19.8 19.9 22.4 22.7
3.6.8 19.8 19.9 22.3 22.6
3.7.6 19.9 19.9 22.3 22.5
3.8.1 19.8 24.9 22.4 22.6
The variation is even smaller, except reversed
on 3.8.1 32-bit is much slower again.
One more, just with CPython 3.8.0 instead of 3.8.1, where it also happens.
32-bit 64-bit
CPython iter reversed iter reversed
3.5.4 19.5 19.6 21.9 22.2
3.6.8 19.5 19.7 21.8 22.1
3.7.6 19.5 19.6 21.7 22.0
3.8.0 19.4 24.5 21.7 22.1
Upvotes: 19
Views: 1423
Reputation: 226256
I've tried this on Python 3.9 and Python 3.11 using the stock macOS builds from python.org. The timings show that forward and reverse iteration run at close to the same speed:
% python3.11 -m timeit -s "a = list(range(1000))" "list(iter(a))"
100000 loops, best of 5: 2.84 usec per loop
% python3.11 -m timeit -s "a = list(range(1000))" "list(reversed(a))"
100000 loops, best of 5: 2.85 usec per loop
% python3.9 -m timeit -s "a = list(range(1000))" "list(iter(a))"
100000 loops, best of 5: 2.87 usec per loop
% python3.9 -m timeit -s "a = list(range(1000))" "list(reversed(a))"
100000 loops, best of 5: 2.91 usec per loop
Examining the C source shows that the code for list_iterator
and list_reverseiterator
are almost identical.
The expectation is that the reversed iterator will be very slightly slower for two reasons:
The loop termination logic for reversed has two conditions index>=0 && index < PyList_GET_SIZE(seq)
instead of the one condition for forward iteration it->it_index < PyList_GET_SIZE(seq)
.
The memory access controllers on some CPUs will issue automatic prefetches when traversing an array in a forward direction but not in reverse. See §12.1 Optimizing Caching in Agner Fog's "Optimizing subroutines in assembly" language.
Given the source code is almost identical, the cause of variability is likely due to the compilation and build process.
Perhaps the Windows compiler can better optimize forward iteration with its simpler loop termination condition.
A more likely cause is that the Windows build is using Profile Guided Optimization (PGO) to inform the code generation step. This is very sensitive to how the profile is built. If the profiled code made little or no use of the reverse iterator, then it will not benefit from PGO.
The usual recommendation here is to rerun PGO using code that you care about rather than just the test suite. That will tune the build to your specific needs.
Consecutive integers created by range()
tend to be small objects and laid out in consecutive memory locations.
Also, Tte timing for a small number of iterations tends to also include the overhead of the calls to list, iter, and reversed.
For a better benchmark, I would use a larger, well shuffled dataset consisting of more interesting objects. Also, I would skip including list()
in the loop so the timing can focus just on the iteration rather than on the list building.
from timeit import repeat
setup = '''
from random import randrange
from random import randrange, shuffle
data = [str(randrange(100_000)) for i in range(10_000)]
shuffle(data)
'''
forward = 'for x in iter(data): pass'
backward = 'for x in reversed(data): pass'
Here are two runs:
>>> min(repeat(forward, setup, repeat=7, number=100_000))
4.0327267500106245
>>> min(repeat(backward, setup, repeat=7, number=100_000))
4.094246666994877
Upvotes: 3
Reputation: 40
I've been watching on python documentation and found this:
If the
__reversed__
() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__
() and__getitem__
()). Objects that support the sequence protocol should only provide__reversed__
() if they can provide an implementation that is more efficient than the one provided by reversed().
Probably is because of that.
Im not an expert but i saw the question is 7 months without answer and tryied to give you one based on python documentation.
Those are the resouces i used:
https://docs.python.org/3/library/functions.html#reversed https://docs.python.org/3/reference/datamodel.html#object
Upvotes: -1