why we recommend iterators instead of lists (low-level explanation)

Question

I know that this question have been asked a lot of times here and outside. But I steel trying to figure out why we recommend the use of iterators but not lists when we have large dataset.

In this question, people talking about memory and time advantages of using iterators instead of lists but without giving a low-level arguments.

In the mail pointed by the accepted answer, it's written:

Iterators have a tiny constant size while lists take space proportional to the length of the list. The part that is not obvious is that looping over the iterator re-uses the same memory location again an again. So the relevant data is almost always in the hardware memory cache.

But why does iterators takes a tiny constant size and why is looping over the iterator re-uses the same memory location again an again?

larsks · Accepted Answer

But why does iterators takes a tiny constant size and why is looping over the iterator re-uses the same memory location again an again?

Let's say you are reading lines from a file. If you were to create a list from all the lines in the file:

lines = myfile.readlines()
for line in lines:
    ...

...this loads the entire file into memory. If the file is sufficiently large, you will consume allt he available memory and your program will crash.

On the other hand, if you use the iterator:

for line in myfile:
   ...

Then Python only needs to read in enough data to find the next EOL character. This uses substantially less memory as long as you are working with a line-oriented file (if the file has no EOL characters, then of course there is no advantage in this example).

The same reasoning applies, for example, to xrange() vs range() (where the latter returns a list, which will consumes a large number of resources if the range is large, while the former only needs to maintain a counter).

why we recommend iterators instead of lists (low-level explanation)

Answers (1)

Related Questions