qwwqwwq
qwwqwwq

Reputation: 7329

Why is python itertools "consume" recipe faster than calling next n times?

In the python documentation for itertools it provides the following "recipe" for advancing an iterator n steps:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

I'm wondering why this recipe is fundamentally different from something like this (aside from the handling of consuming the whole iterator):

def other_consume(iterable, n):
    for i in xrange(n):
        next(iterable, None)

I used timeit to confirm that, as expected, the above approach is much slower. What's going on in the recipe that allows for this superior performance? I get that it uses islice, but looking at islice, it APPEARS to be doing fundamentally the same thing as the code above:

def islice(iterable, *args):
    s = slice(*args)
    it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
    nexti = next(it)
    ### it seems as if this loop yields from the iterable n times via enumerate
    ### how is this different from calling next n times?
    for i, element in enumerate(iterable): 
        if i == nexti:
            yield element
            nexti = next(it)

Note: even if instead of importing islice from itertools I define it using the python equivalent from the docs shown above, the recipe is still faster..

EDIT: timeit code here:

timeit.timeit('a = iter([random() for i in xrange(1000000)]); consume(a, 1000000)', setup="from __main__ import consume,random", number=10)
timeit.timeit('a = iter([random() for i in xrange(1000000)]); other_consume(a, 1000000)', setup="from __main__ import other_consume,random", number=10)

other_consume is ~ 2.5x slower each time I run this

Upvotes: 3

Views: 3043

Answers (2)

Blckknght
Blckknght

Reputation: 104762

The reason that the recipe is faster is that its key pieces (islice, deque) are implemented in C, rather than in pure Python. Part of it is that a C loop is faster than for i in xrange(n). Another part is that Python function calls (e.g. next()) are more expensive than their C equivalents.

The version of itertools.islice that you've copied from the documentation is not correct, and its apparently great performance is because a consume function using it doesn't consume anything. (For that reason I'm not showing that version's test results below, though it was pretty fast! :)

Here are a couple different implementations, so we can test what is fastest:

import collections
from itertools import islice

# this is the official recipe
def consume_itertools(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

# your initial version, using a for loop on a range
def consume_qwwqwwq(iterator, n):
    for i in xrange(n):
        next(iterator, None)

# a slightly better version, that only has a single loop:
def consume_blckknght(iterator, n):
    if n <= 0:
        return
    for i, v in enumerate(iterator, start=1):
        if i == n:
            break

Timings on my system (Python 2.7.3 64-bit on Windows 7):

>>> test = 'consume(iter(xrange(100000)), 1000)'
>>> timeit.timeit(test, 'from consume import consume_itertools as consume')
7.623556181657534
>>> timeit.timeit(test, 'from consume import consume_qwwqwwq as consume')
106.8907442334584
>>> timeit.timeit(test, 'from consume import consume_blckknght as consume')
56.81081856366518

My assessment is that a nearly empty Python loop takes seven or eight times longer to run than the equivalent loop in C. Looping over two sequences at once (as consume_qwwqwwq does by calling next on iterator in addition to the for loop on the xrange) makes the cost roughly double.

Upvotes: 6

Martijn Pieters
Martijn Pieters

Reputation: 1123420

The documentation on itertools.islice() is flawed and doesn't handle the edgecase for start == stop properly. It is exactly that edgecase that consume() uses.

For islice(it, n, n), exactly n elements are consumed from it but nothing is ever yielded. Instead, StopIteration is raised after those n elements have been consumed.

The Python version you used to test with on the other hand raises StopIteration immediately without ever consuming anything from it. This makes any timings against this pure-python version incorrect and way too fast.

This is because the xrange(n, n, 1) iterator immediately raises StopIteration:

>>> it = iter(xrange(1, 1))
>>> print next(it)
Traceback (most recent call last):
  File "prog.py", line 4, in <module>
    print next(it)
StopIteration

Upvotes: 0

Related Questions