Reputation: 2348
Here's an example of initializing an array of ten million random numbers, using a list (a
), and using tuple-like generator (b
). The result is exactly the same, the list or tuple is never used, so there's no practical advantage with one or the other
from random import randint
from array import array
a = array('H', [randint(1, 100) for _ in range(0, 10000000)])
b = array('H', (randint(1, 100) for _ in range(0, 10000000)))
So the question is which one to use. In principle, my understanding is that that a tuple should be able to get away with using less resources than a list, but since this list and tuple are not kept, it should be possible that the code is executed without ever initializing the intermediate data structure… My tests indicate that the list is slightly faster in this case. I can only imagine that this is because the Python implementation has more optimization around lists than tuples. Can I expect this to be consistent?
More generally, should I use one or the other, and why? (Or should I do this kind initialization some other way completely.)
Update: Answers and comments made me realize that the b
example is not actually a tuple but a generator, so I edited a bit in the headline and the text above to reflect that. Also I tried splitting the list version into two lines like this, which should force the list to actually be instantiated:
g = [randint(1, 100) for _ in range(0, 10000000)]
a = array('H', g)
It appears to make no difference. The list version takes about 8.5 seconds, and the generator version takes about 9 seconds.
Upvotes: 1
Views: 417
Reputation: 37317
[randint(1, 100) for _ in range(0, 10000000)]
This is a list comprehension. Every element is evaluated in a tight loop and put together into a list, so it is generally faster but takes more RAM (everything comes out at once).
(randint(1, 100) for _ in range(0, 10000000))
This is a generator expression. No element is evaluated at this point, and one of them comes out at a time when you call next()
on the resulting generator. It's slower but takes a consistent (small) amount of memory.
As given in the other answer, if you want a tuple, you should convert either into one:
tuple([randint(1, 100) for _ in range(0, 10000000)])
tuple(randint(1, 100) for _ in range(0, 10000000))
Let's come back to your question:
In general, if you use a list comprehension or generator expression as an initializer of another sequential data structure (list
, array
, etc.), it makes no difference except for the memory-time tradeoff mentioned above. Things you need to consider is as simple as performance and memory budget. You would prefer the list comprehension if you need more speed (or write a C program to be absolutely fast) or the generator expression if you need to keep the memory consumption low.
If you plan to reuse the resulting sequence, things start to get interesting.
A list is strictly a list, and can for all purposes be used as a list:
a = [i for i in range(5)]
a[3] # 3
a.append(5) # a = [0, 1, 2, 3, 4, 5]
for _ in a:
print("Hello")
# Prints 6 lines in total
for _ in a:
print("Bye")
# Prints another 6 lines
b = list(reversed(a)) # b = [5, 4, 3, 2, 1, 0]
A generator can be only used once.
a = (i for i in range(5))
a[3] # TypeError: generator object isn't subscriptable
a.append(5) # AttributeError: generator has no attribute 'append'
for _ in a:
print("Hello")
# Prints 5 lines in total
for _ in a:
print("Bye")
# Nothing this time, because
# the generator has already been consumed
b = list(reversed(a)) # TypeError: generator isn't reversible
The final answer is: Know what you want to do, and find the appropriate data structure for it.
Upvotes: 1
Reputation: 31416
Although it looks like it, (randint(1, 100) for _ in range(0, 1000000))
is not a tuple, it's a generator:
>>> type((randint(1, 100) for _ in range(0, 1000000)))
<class 'generator'>
>>>
If you really want a tuple, use:
b = array('H', tuple(randint(1, 100) for _ in range(0, 1000000)))
The list being a bit faster than the generator makes sense, since the generator generates the next value when asked, one at a time, while the list comprehension allocates all the memory needed and then proceeds to fill it with values all in one go. That optimisation for speed is paid for in memory space.
I'd favour the generator, since it will work regardless of most reasonable memory restrictions and would work for any number of random numbers, while the speedup of the list is minimal. Unless you need to generate this list again and again, at which time the speedup would start to count - but then you'd probably use the same copy of the list each time to begin with.
Upvotes: 2