Reputation: 285

Why is this slicing code faster than more procedural code?

I have a Python function that takes a list and returns a generator yielding 2-tuples of each adjacent pair, e.g.

>>> list(pairs([1, 2, 3, 4]))
[(1, 2), (2, 3), (3, 4)]

I've considered an implementation using 2 slices:

def pairs(xs):
    for p in zip(xs[:-1], xs[1:]): 
        yield p

and one written in a more procedural style:

def pairs(xs):
    last = object()
    dummy = last
    for x in xs:
        if last is not dummy:
            yield last,x
        last = x

Testing using range(2 ** 15) as input yields the following times (you can find my testing code and output here):

2 slices: 100 loops, best of 3: 4.23 msec per loop
0 slices: 100 loops, best of 3: 5.68 msec per loop

Part of the performance hit for the sliceless implementation is the comparison in the loop (if last is not dummy). Removing that (making the output incorrect) improves its performance, but it's still slower than the zip-a-pair-of-slices implementation:

2 slices: 100 loops, best of 3: 4.48 msec per loop
0 slices: 100 loops, best of 3: 5.2 msec per loop

So, I'm stumped. Why is zipping together 2 slices, effectively iterating over the list twice in parallel, faster than iterating once, updating last and x as you go?

EDIT

Dan Lenski proposed a third implementation:

def pairs(xs):
    for ii in range(1,len(xs)):
        yield xs[ii-1], xs[ii]

Here's its comparison to the other implementations:

2 slices: 100 loops, best of 3: 4.37 msec per loop
0 slices: 100 loops, best of 3: 5.61 msec per loop
Lenski's: 100 loops, best of 3: 6.43 msec per loop

It's even slower! Which is baffling to me.

EDIT 2:

ssm suggested using itertools.izip instead of zip, and it's even faster than zip:

2 slices, izip: 100 loops, best of 3: 3.68 msec per loop

So, izip is the winner so far! But still for difficult-to inspect reasons.

Upvotes: 2

Answers (3)

Dan Lenski

Reputation: 79742

Lots of interesting discussion elsewhere in this thread. Basically, we started out comparing two versions of this function, which I'm going to describe with the following dumb names:

The "zip-py" version:

def pairs(xs):
    for p in zip(xs[:-1], xs[1:]): 
        yield p

The "loopy" version:

def pairs(xs):
    last = object()
    dummy = last
    for x in xs:
        if last is not dummy:
            yield last,x
        last = x

So why does the loopy version turn out to be slower? Basically, I think it comes down to a couple things:

The loopy version explicitly does extra work: it compares two objects' identities (if last is not dummy: ...) on every pair-generating iteration of the inner loop.
- @mambocab's edit shows that not doing this comparison does make the loopy version
  slightly faster, but doesn't fully close the gap.
The zippy version does more stuff in compiled C code that the loopy version does in Python code:
- Combining two objects into a tuple. The loopy version does yield last,x, while in the zippy version the tuple p comes straight from zip, so it just does yield p.
- Binding variable names to object: the loopy version does this twice in every loop, assigning x in the for loop and last=x. The zippy version does this just once, in the for loop.
Interestingly, there is one way in which the zippy version is actually doing more work: it uses two listiterators, iter(xs[:-1]) and iter(xs[1:]), which get passed to zip. The loopy version only uses one listiterator (for x in xs).
- Creating a listiterator object (the output of iter([])) is likely a very highly optimized operation since Python programmers use it so frequently.
- Iterating over list slices, xs[:-1] and xs[1:], is a very lightweight operation which adds almost no overhead compared to iterating over the whole list. Essentially, it just means moving the starting or ending point of the iterator, but not changing what happens on each iteration.

Upvotes: 2

ssm

Reputation: 5373

This i the result for the iZip which is actually closer to your implementation. Looks like what you would expect. The zip version is creating the entire list in memory within the function so it is the fastest. The loop version just los through the list, so it is a little slower. The izipis the closest in resemblance to the code, but I am guessing there is some memory-management backend processes which increase the time of execution.

In [11]: %timeit pairsLoop([1,2,3,4,5])
1000000 loops, best of 3: 651 ns per loop

In [12]: %timeit pairsZip([1,2,3,4,5])
1000000 loops, best of 3: 637 ns per loop

In [13]: %timeit pairsIzip([1,2,3,4,5])
1000000 loops, best of 3: 655 ns per loop

The version of code is shown below as requested:

from itertools import izip


def pairsIzip(xs):
    for p in izip(xs[:-1], xs[1:]): 
        yield p

def pairsZip(xs):
    for p in zip(xs[:-1], xs[1:]): 
        yield p

def pairsLoop(xs):
    last = object()
    dummy = last
    for x in xs:
        if last is not dummy:
            yield last,x
        last = x

Upvotes: 2

Dan Lenski

Reputation: 79742

I suspect the main reason that the second version is slower is because it does a comparison operation for every single pair that it yields:

# pair-generating loop
for x in xs:
    if last is not dummy:
       yield last,x
    last = x

The first version does not do anything but spit out values. With the loop variables renamed, it's equivalent to this:

# pair-generating loop
for last,x in zip(xs[:-1], xs[1:]):
    yield last,x

It's not especially pretty or Pythonic, but you could write a procedural version without a comparison in the inner loop. How fast does this one run?

def pairs(xs):
    for ii in range(1,len(xs)):
        yield xs[ii-1], xs[ii]

Upvotes: 1

Why is this slicing code faster than more procedural code?

Answers (3)

Related Questions