tlg
tlg

Reputation: 51

Speed to iterate multiple times over a generator compared to a list

I expected that in the case of multiple loops list iteration will be much faster than using a generator, and my code suggests this is false.

My understanding is (by operation I mean any expression defining an element):

And I checked my expectations using the following code:

from timeit import timeit

def pow2_list(n):
    """Return a list with powers of 2"""

    results = []

    for i in range(n):
        results.append(2**i)

    return results

def pow2_gen(n):
    """Generator of powers of 2"""

    for i in range(n):
        yield 2**i

def loop(iterator, n=1000):
    """Loop n times over iterable object"""

    for _ in range(n):
        for _ in iterator:
            pass

l = pow2_list(1000) # point to a list
g = pow2_gen(1000)  # point to a generator


time_list = \
    timeit("loop(l)", setup="from __main__ import loop, l", number=10)

time_gen = \
    timeit("loop(g)", setup="from __main__ import loop, g", number=10)

print("Loops over list took: ", time_list)
print("Loops over generator took: ", time_gen)

And the results surprised me...

Loops over list took:  0.20484769299946493
Loops over generator took:  0.0019217690005461918

Somehow using generators appears much faster than lists, even when looping over 1000 times. And in this case we are talking about two orders of magnitude! Why?

EDIT:

Thanks for answers. Now I see my mistake. I wrongly assumed that generator starts from beginning on a new loop, like range:

>>> x = range(10)
>>> sum(x)
45
>>> sum(x)
45

But this was naive (range is not a generator...).

Regarding possible duplicate comment: my problem concerned multiple loops over generator, which is not explained in the other thread.

Upvotes: 4

Views: 2704

Answers (3)

wildwilhelm
wildwilhelm

Reputation: 5019

Your generator is actually only looping once. Once created with pow2_gen, g stores a generator; the very first time through loop, this generator is consumed, and emits StopIteration. The other times through loop, next(g) (or g.next() in Python 2) just continues to throw StopIteration, so, in effect g represents an empty sequence.

To make the comparison more fair, you would need to re-create the generator every time you loop.

A further difficulty with the way you've approached this is that you're calling append to build your list, which is likely the very slowest way to construct a list. More often, lists are built with list comprehensions.

The following code lets us pick apart the timing a bit more carefully. create_list and create_gen create lists and generators, respectively, using list comprehension and generator expressions. time_loop is like your loop method, while time_apply is a version of loop that re-creates the iterable each time through the loop.

def create_list(n=1000):
    return [2**i for i in range(n)]

def create_gen(n=1000):
    return (2**i for i in range(n))

def time_loop(iterator, n=1000):
    for t in range(n):
        for v in iterator:
            pass

def time_apply(create_fn, fn_arg, n=1000):
    for t in range(n):
        iterator = create_fn(fn_arg)
        time_loop(iterator, 1)

print('time_loop(create_list): %.3f' % timeit("time_loop(create_list(1000))",
                                              setup="from __main__ import *",
                                              number=10))

print('time_loop(create_gen): %.3f' % timeit("time_loop(create_gen(1000))",
                                             setup="from __main__ import *",
                                             number=10))

print('time_apply(create_list): %.3f' % timeit("time_apply(create_list, 1000)",
                                               setup="from __main__ import *",
                                               number=10))

print('time_apply(create_gen): %.3f' % timeit("time_apply(create_gen, 1000)",
                                              setup="from __main__ import *",
                                              number=10))

Results on my box suggest that building a list (time_apply(create_list)) is similar in time to (or maybe even faster than) building a generator (time_apply(create_gen)).

time_loop(create_list): 0.244
time_loop(create_gen): 0.028
time_apply(create_list): 21.190
time_apply(create_gen): 21.555

You can see the same effect you've documented in your question, which is that time_loop(create_gen) is an order of magnitude faster than time_loop(create_list). Again, this is because the generator created is only being iterated over once, rather than the many loops over the list.

As you hypothesise, building a list once and iterating over it many times (time_loop(create_list)) is faster than iterating over a generator many times (time_apply(create_gen)) in this particular scenario.

The trade-off between list and generator is going to be strongly dependent on how big the iterator you're creating is. With 1000 items, I would expect lists to be pretty fast. With 100,000 items, things might look different.

print('create big list: %.3f' % timeit("l = create_list(100000)",
                                       setup="from __main__ import *",
                                       number=10))

print('create big gen: %.3f' % timeit("g = create_gen(100000)",
                                      setup="from __main__ import *",
                                      number=10))

Here I'm getting:

create big list: 209.748
create big gen: 0.023

Python uses up between 700 and 800 MB of memory building the big list; the generator uses almost nothing at all. Memory allocation and garbage cleanup are computationally expensive in Python, and predictably make your code slow; generators are a very simple way to avoid gobbling up your machine's RAM, and can make a big difference to runtime.

Upvotes: 6

Chris_Rands
Chris_Rands

Reputation: 41248

Your test does not work because your generator is exhausted on the first pass in loop(). This is one of the advantages of lists over generators, you can iterate over them multiple times (at the expense of storing the full list in memory).

Here is an illustration of this. I'm using a generator expression and list comprehension (which is more optimized than using append in a for loop) but the concept is the same:

>>> gen = (i for i in range(3))
>>> for n in range(2):
...     for i in gen:
...         print(i)
... 
0 # 1st print
1
2 # after one loop the iterator is exhausted
>>> 
>>> lst = [x for x in range(3)]
>>> for n in range(2):
...     for i in lst:
...         print(i)
... 
0 # 1st print
1
2
0 # 2nd print
1
2 
>>> 

For an equivalent test you should rebuild the generator after each iteration of the outer loop:

>>> for n in range(2):
...     gen = (i for i in range(3))
...     for i in gen:
...         print(i)
... 
0 # 1st print
1
2
0 # 2nd print
1
2
>>>

Upvotes: 3

Dunes
Dunes

Reputation: 40903

There is a problem with your test. Namely, a generator is not reusable. Once exhausted it cannot be used again, and a new one must be generated. eg.

l = [0, 1, 2, 4, 5]
g = iter(l) # creates an iterator (a type of generator) over the list

sum_list0 = sum(l)
sum_list1 = sum(1)
assert sum_list0 == sum_list1 # all working normally

sum_gen0 = sum(g) # consumes generator
sum_gen1 = sum(g) # sum of empty generator is 0
assert sum_gen0 == sum_list1 # result is correct
assert sum_gen1 == sum_list1, "second result was incorrect" # because generator was exhausted

For your test to work you must recreate the generator afresh in the statement you pass to timeit.

from timeit import timeit

n = 1000
repeats = 10000

list_powers = [2**i for i in range(n)]
def gen_powers():
    for i in range(n):
        yield 2**i

time_list = timeit("min(list_powers)", globals=globals(), number=repeats)
time_gen = timeit("min(gen_powers())", globals=globals(), number=repeats)

print("Loops over list took: ", time_list)
print("Loops over generator took: ", time_gen)

gives:

Loops over list took:  0.24689035064701784
Loops over generator took:  13.551637053904571

Now the generator is two orders of magnitude slower than the list. This is to be expected as the size of the sequence is small compared to the number of iterations over the sequence. If n is large, then the list creation becomes slower. This is because of how lists are expanded when appending new items, and the final size not passed to the list when created. Increasing the number of iterations will speed up the list in comparison to the generator as the amount of work that is required by the generator is increased, whilst for the list it remains constant. Since n is only 1000 (small), and repeats dominates n, then the generator is slower.

Upvotes: 4

Related Questions