Reputation: 2124
I was playing around with the memory_profiler
package (downloaded from pip), more specifically, looking at the memory efficiency of looping through a list by creating a temporary list first vs. looping through an "iterator list".
This was a problem that I encountered a while back and I wanted to benchmark my solution. The problem was that I needed to compare each element in a list with the next element in the same list, until all elements had been "dealt with". So I guess this would be an O(n^2) solution (if the most naive solution is picked, for each element in list, loop through list).
Anyways, the three functions below are all doing the same thing (more or less); looping over a list that is zipped with itself-offset-by-one.
import cProfile
@profile
def zips():
li = range(1,20000000)
for tup in zip(li,li[1:]):
pass
del li
@profile
def izips():
from itertools import izip
li = range(1,20000000)
for tup in izip(li,li[1:]):
pass
del li
@profile
def izips2():
from itertools import izip
li = range(1,20000000)
for tup in izip(li,li[1:]):
del tup
del li
if __name__ == '__main__':
zips()
# izips()
# izips2()
The surprising part (to me) was in the memory usage, first I run the zips() function, and although I thought I did clean up, I still ended up with ~1.5 GB in memory:
ipython -m memory_profiler python_profiling.py
Filename: python_profiling.py
Line # Mem usage Increment Line Contents
================================================
10 @profile
11 27.730 MB 0.000 MB def zips():
12 649.301 MB 621.570 MB li = range(1,20000000)
13 3257.605 MB 2608.305 MB for tup in zip(li,li[1:]):
14 1702.504 MB -1555.102 MB pass
15 1549.914 MB -152.590 MB del li
Then I close the interpreter instance and reopen it for running the next test, which is the izips() function:
ipython -m memory_profiler python_profiling.py
Filename: python_profiling.py
Line # Mem usage Increment Line Contents
================================================
17 @profile
18 27.449 MB 0.000 MB def izips():
19 27.449 MB 0.000 MB from itertools import izip
20 649.051 MB 621.602 MB li = range(1,20000000)
21 1899.512 MB 1250.461 MB for tup in izip(li,li[1:]):
22 1746.922 MB -152.590 MB pass
23 1594.332 MB -152.590 MB del li
And then finally I ran a test (again after restarting the interpreter in between) where I tried to explicitly delete the tuple in the for-loop to try to make sure that its memory would be freed (maybe I'm not thinking that correctly?). Turns out that didn't make a difference so I'm guessing that either I'm not prompting GC or that is not the source of my memory overhead.
ipython -m memory_profiler python_profiling.py
Filename: python_profiling.py
Line # Mem usage Increment Line Contents
================================================
25 @profile
26 20.109 MB 0.000 MB def izips2():
27 20.109 MB 0.000 MB from itertools import izip
28 641.676 MB 621.566 MB li = range(1,20000000)
29 1816.953 MB 1175.277 MB for tup in izip(li,li[1:]):
30 1664.387 MB -152.566 MB del tup
31 1511.797 MB -152.590 MB del li
Bottom line: I thought that the overhead of the for loop itself was minimal, and therefore, I was expecting just a little bit more than ~620.000 MB (the memory it takes to store the list) but instead it looks like I have ~2 lists of size 20.000.000 in memory + even more overhead. Can anyone help me explain what all this memory is being used for?? (and what is taking up that ~1.5 GB at the end of each run?)
Upvotes: 2
Views: 5485
Reputation: 1124558
Note that the OS assigns memory in chunks, and doesn't necessarily reclaim it all in one go. I've found the memory profiling package to be wildly inaccurate because it appears it fails to take that into account.
Your li[1:]
slice creates a new list with (2*10**7) - 1 elements, nearly a whole new copy, easily doubling the memory space required for the lists. The zip()
call also returns a full new list object, the output of the zipping action, again requiring memory for the intermediary result, plus 20 million 2-element tuples.
You could use a new iterator instead of slicing:
def zips():
from itertools import izip
li = range(1,20000000)
next_li = iter(li)
next(next_li) # advance one step
for tup in izip(li, next_li):
pass
del li
The list iterator returned from the iter()
call is much more light-weight; it only keeps a reference to the original list and a pointer. Combining this with izip()
avoids creating the output list as well.
Upvotes: 3