Why passing a list as a parameter performs better than passing a generator?

Question

I was making an answer for this question, and when I tested the timing for my solution I came up with a contradiction to what I thought was correct.

The guy who made the question wanted to find a way to know how many different lists were contained within another list. (for more information, you can check the question)

My answer was basically this function:

def how_many_different_lists(lists):
    s = set(str(list_) for list_ in lists)
    return len(s)

Now, the situation came when I measured the time it takes to run and I compared it against basically the same function, but passing a list instead of a generator as a parameter to set():

def the_other_function(lists):
    s = set([str(list_) for list_ in lists])
    return len(s)

This is the decorator I use for testing functions:

import time

def timer(func):
    def func_decorated(*args):
        start_time = time.clock()
        result = func(*args)   
        print(time.clock() - start_time, "seconds")
        return result
    return func_decorated

And this were the results for the given input:

>>> list1 = [[1,2,3],[1,2,3],[1,2,2],[1,2,2]]
>>> how_many_different_lists(list1)
6.916326725558974e-05 seconds
2
>>> the_other_function(list1)
3.882067261429256e-05 seconds
2

Even for larger lists:

# (52 elements)
>>> list2= [[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2]]
>>> how_many_different_lists(list2)
0.00023560132331112982 seconds
2
>>> the_other_function(list2)
0.00021329059177332965 seconds
2

Now, my question is: Why is the second example faster than the first one? Aren't generators supposed to be faster due to the fact that the produce the elements "on demand"? I used to think that making a list and iterating through it was slower.

PS: I have tested many many times getting basically the same results.

kederrac · Accepted Answer

I have been benchmarking your functions:

from simple_benchmark import BenchmarkBuilder
from random import choice

b = BenchmarkBuilder()
from operator import setitem


@b.add_function()
def how_many_different_lists(lists):
    s = set(str(list_) for list_ in lists)
    return len(s)


@b.add_function()
def the_other_function(lists):
    s = set([str(list_) for list_ in lists])
    return len(s)


@b.add_arguments('Number of lists in the list')
def argument_provider():
    for exp in range(2, 18):
        size = 2**exp

        yield size,  [list(range(choice(range(100)))) for _ in range(size)]


r = b.run()
r.plot()

Generators are lazy because generator expression will create the items on the fly in comparison with list comprehension which will create the entire list in memory. You can read more here: Generator Expressions vs. List Comprehension

As you can see from the benchmark there is not such a big difference between them.

Why passing a list as a parameter performs better than passing a generator?

Answers (1)

Related Questions