Python generator vs list as array initializer

Question

Here's an example of initializing an array of ten million random numbers, using a list (a), and using tuple-like generator (b). The result is exactly the same, the list or tuple is never used, so there's no practical advantage with one or the other

from random import randint
from array import array

a = array('H', [randint(1, 100) for _ in range(0, 10000000)])
b = array('H', (randint(1, 100) for _ in range(0, 10000000)))

So the question is which one to use. In principle, my understanding is that that a tuple should be able to get away with using less resources than a list, but since this list and tuple are not kept, it should be possible that the code is executed without ever initializing the intermediate data structure… My tests indicate that the list is slightly faster in this case. I can only imagine that this is because the Python implementation has more optimization around lists than tuples. Can I expect this to be consistent?

More generally, should I use one or the other, and why? (Or should I do this kind initialization some other way completely.)

Update: Answers and comments made me realize that the b example is not actually a tuple but a generator, so I edited a bit in the headline and the text above to reflect that. Also I tried splitting the list version into two lines like this, which should force the list to actually be instantiated:

g = [randint(1, 100) for _ in range(0, 10000000)]
a = array('H', g)

It appears to make no difference. The list version takes about 8.5 seconds, and the generator version takes about 9 seconds.

iBug · Accepted Answer

[randint(1, 100) for _ in range(0, 10000000)]

This is a list comprehension. Every element is evaluated in a tight loop and put together into a list, so it is generally faster but takes more RAM (everything comes out at once).

(randint(1, 100) for _ in range(0, 10000000))

This is a generator expression. No element is evaluated at this point, and one of them comes out at a time when you call next() on the resulting generator. It's slower but takes a consistent (small) amount of memory.

As given in the other answer, if you want a tuple, you should convert either into one:

tuple([randint(1, 100) for _ in range(0, 10000000)])
tuple(randint(1, 100) for _ in range(0, 10000000))

Let's come back to your question:

When to use which?

In general, if you use a list comprehension or generator expression as an initializer of another sequential data structure (list, array, etc.), it makes no difference except for the memory-time tradeoff mentioned above. Things you need to consider is as simple as performance and memory budget. You would prefer the list comprehension if you need more speed (or write a C program to be absolutely fast) or the generator expression if you need to keep the memory consumption low.

If you plan to reuse the resulting sequence, things start to get interesting.

A list is strictly a list, and can for all purposes be used as a list:

a = [i for i in range(5)]
a[3]  # 3
a.append(5)            # a = [0, 1, 2, 3, 4, 5]
for _ in a:
    print("Hello")
                       # Prints 6 lines in total
for _ in a:
    print("Bye")
                       # Prints another 6 lines
b = list(reversed(a))  # b = [5, 4, 3, 2, 1, 0]

A generator can be only used once.

a = (i for i in range(5))
a[3]                   # TypeError: generator object isn't subscriptable
a.append(5)            # AttributeError: generator has no attribute 'append'
for _ in a:
    print("Hello")
                       # Prints 5 lines in total
for _ in a:
    print("Bye")
                       # Nothing this time, because
                       # the generator has already been consumed
b = list(reversed(a))  # TypeError: generator isn't reversible

The final answer is: Know what you want to do, and find the appropriate data structure for it.

Python generator vs list as array initializer

Answers (2)

When to use which?

Related Questions