Karl Knechtel
Karl Knechtel

Reputation: 61643

itertools vs. generator expressions for flattening and repetition - how can we explain these timing results?

Motivated by discussion on this question, I decided to try some performance testing. The task I have set is somewhat simpler - given a source list A, we wish to create a lazy iterable that repeats each element of A, N times:

def test(implementation):
    A, N = list('abc'), 3
    assert list(implementation(A, N)) == list('aaabbbccc')

I came up with several implementations, and tested them thus:

from itertools import chain, repeat, starmap
from timeit import timeit

flatten = chain.from_iterable

def consume(iterable):
    for _ in iterable:
        pass

# FAST approaches
def tools(original, count):
    return flatten(map(repeat, original, repeat(count)))

def tools_star(original, count):
    return flatten(starmap(repeat, zip(original, repeat(count))))

def mixed(original, count):
    return flatten(repeat(a, count) for a in original)

# SLOW approaches
def mixed2(original, count):
    return (x for a in original for x in repeat(a, count))

def explicit(original, count):
    for a in original:
        for _ in range(count):
            yield a

def generator(original, count):
    return (a for a in original for _ in range(count))

def mixed3(original, count):
    return flatten((a for _ in range(count)) for a in original)

if __name__ == '__main__':
    for impl in (tools, tools_star, mixed, mixed2, explicit, generator, mixed3):
        for consumption in (consume, list):
            to_time = lambda: consumption(impl(list(range(1000)), 1000))
            elapsed = timeit(to_time, number=100)
            print(f'{consumption.__name__}({impl.__name__}): {elapsed:.2f}')

Here are three examples of timing results on my machine:

consume(tools): 1.10
list(tools): 2.96
consume(tools_star): 1.10
list(tools_star): 2.97
consume(mixed): 1.11
list(mixed): 2.91
consume(mixed2): 4.60
list(mixed2): 6.53
consume(explicit): 5.45
list(explicit): 8.09
consume(generator): 5.98
list(generator): 7.62
consume(mixed3): 5.75
list(mixed3): 7.67

consume(tools): 1.10
list(tools): 2.88
consume(tools_star): 1.10
list(tools_star): 2.89
consume(mixed): 1.11
list(mixed): 2.87
consume(mixed2): 4.56
list(mixed2): 6.39
consume(explicit): 5.42
list(explicit): 7.24
consume(generator): 5.91
list(generator): 7.48
consume(mixed3): 5.80
list(mixed3): 7.61

consume(tools): 1.14
list(tools): 2.98
consume(tools_star): 1.10
list(tools_star): 2.90
consume(mixed): 1.11
list(mixed): 2.92
consume(mixed2): 4.76
list(mixed2): 6.49
consume(explicit): 5.69
list(explicit): 7.38
consume(generator): 5.68
list(generator): 7.52
consume(mixed3): 5.75
list(mixed3): 7.86

And from this I draw the following conclusions:

What all is going on under the hood of itertools? How can we explain these timing results? It's especially strange the way that chain.from_iterable and repeat don't offer incremental performance benefits here, but rely on each other entirely. And what is going on with the list construction? Isn't the added overhead the same in each case (repeatedly append the same sequence of elements to an empty list)?

Upvotes: -6

Views: 239

Answers (1)

Shadows In Rain
Shadows In Rain

Reputation: 1220

It mostly boils down to Big-O of the amount of time spent in the interpreter.

  • Having no loops in the interpreter allows the C function to communicate directly.
    • But nesting so many itertools adds small but measurable overhead.
      • I in the table below.
  • One loop is just ×1000 of few opcodes.
  • Nested loop is whopping ×1000000 of the same.
  • Yielding directly from repeat is few opcodes shorter than storing from range before yielding.
    • Y in the table below.
  • explicit and generator are virtually equivalent.
  • Nested generator is a nested function call — expensive.
    • G in the table below.

Here are my results:

method complexity (interpreter only) list consume
tools O(0) + I 0.21 0.09
tools_star O(0) + I 0.21 0.09
mixed O(N) 0.20 0.09
mixed2 O(N²) 0.54 0.47
explicit O(N²) + Y 0.64 0.60
generator O(N²) + Y 0.64 0.60
tools O(N²) + G 0.71 0.65

Seems pretty convincing.

Also I've modified the code a little to improve the timing method and readability.

PRODUCERS = (tools, tools_star, mixed, mixed2, explicit, generator, mixed3)
CONSUMERS = (list, consume)
N = 1000
SAMPLES = 50
BATCH = 10

if __name__ == '__main__':
    for consumer in CONSUMERS:
        print(f"{consumer.__name__}:")
        for producer in PRODUCERS:
            to_time = lambda: consumer(producer(list(range(N)), N))
            elapsed = timeit.repeat(to_time, repeat=SAMPLES, number=BATCH)
            emin = min(elapsed) # get rid of the random fluctuations for the best theoretical time
            print(f'{emin:02.2f} | {producer.__name__}')
        print()

Upvotes: 4

Related Questions