aright
aright

Reputation: 2124

What is taking up the memory in this for-loop?

I was playing around with the memory_profiler package (downloaded from pip), more specifically, looking at the memory efficiency of looping through a list by creating a temporary list first vs. looping through an "iterator list".

This was a problem that I encountered a while back and I wanted to benchmark my solution. The problem was that I needed to compare each element in a list with the next element in the same list, until all elements had been "dealt with". So I guess this would be an O(n^2) solution (if the most naive solution is picked, for each element in list, loop through list).

Anyways, the three functions below are all doing the same thing (more or less); looping over a list that is zipped with itself-offset-by-one.

import cProfile

@profile
def zips():
    li = range(1,20000000)
    for tup in zip(li,li[1:]):
        pass
    del li

@profile
def izips():
    from itertools import izip
    li = range(1,20000000)
    for tup in izip(li,li[1:]):
        pass
    del li

@profile
def izips2():
    from itertools import izip
    li = range(1,20000000)
    for tup in izip(li,li[1:]):
        del tup
    del li



if __name__ == '__main__':
    zips()
    # izips()
    # izips2()

The surprising part (to me) was in the memory usage, first I run the zips() function, and although I thought I did clean up, I still ended up with ~1.5 GB in memory:

ipython -m memory_profiler python_profiling.py 
Filename: python_profiling.py

Line #    Mem usage    Increment   Line Contents
================================================
    10                             @profile
    11    27.730 MB     0.000 MB   def zips():
    12   649.301 MB   621.570 MB    li = range(1,20000000)
    13  3257.605 MB  2608.305 MB    for tup in zip(li,li[1:]):
    14  1702.504 MB -1555.102 MB        pass
    15  1549.914 MB  -152.590 MB    del li

Then I close the interpreter instance and reopen it for running the next test, which is the izips() function:

ipython -m memory_profiler python_profiling.py 
Filename: python_profiling.py

Line #    Mem usage    Increment   Line Contents
================================================
    17                             @profile
    18    27.449 MB     0.000 MB   def izips():
    19    27.449 MB     0.000 MB    from itertools import izip
    20   649.051 MB   621.602 MB    li = range(1,20000000)
    21  1899.512 MB  1250.461 MB    for tup in izip(li,li[1:]):
    22  1746.922 MB  -152.590 MB        pass
    23  1594.332 MB  -152.590 MB    del li

And then finally I ran a test (again after restarting the interpreter in between) where I tried to explicitly delete the tuple in the for-loop to try to make sure that its memory would be freed (maybe I'm not thinking that correctly?). Turns out that didn't make a difference so I'm guessing that either I'm not prompting GC or that is not the source of my memory overhead.

ipython -m memory_profiler python_profiling.py 
Filename: python_profiling.py

Line #    Mem usage    Increment   Line Contents
================================================
    25                             @profile
    26    20.109 MB     0.000 MB   def izips2():
    27    20.109 MB     0.000 MB    from itertools import izip
    28   641.676 MB   621.566 MB    li = range(1,20000000)
    29  1816.953 MB  1175.277 MB    for tup in izip(li,li[1:]):
    30  1664.387 MB  -152.566 MB        del tup
    31  1511.797 MB  -152.590 MB    del li

Bottom line: I thought that the overhead of the for loop itself was minimal, and therefore, I was expecting just a little bit more than ~620.000 MB (the memory it takes to store the list) but instead it looks like I have ~2 lists of size 20.000.000 in memory + even more overhead. Can anyone help me explain what all this memory is being used for?? (and what is taking up that ~1.5 GB at the end of each run?)

Upvotes: 2

Views: 5485

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124558

Note that the OS assigns memory in chunks, and doesn't necessarily reclaim it all in one go. I've found the memory profiling package to be wildly inaccurate because it appears it fails to take that into account.

Your li[1:] slice creates a new list with (2*10**7) - 1 elements, nearly a whole new copy, easily doubling the memory space required for the lists. The zip() call also returns a full new list object, the output of the zipping action, again requiring memory for the intermediary result, plus 20 million 2-element tuples.

You could use a new iterator instead of slicing:

def zips():
    from itertools import izip
    li = range(1,20000000)
    next_li = iter(li)
    next(next_li)  # advance one step
    for tup in izip(li, next_li):
        pass
    del li

The list iterator returned from the iter() call is much more light-weight; it only keeps a reference to the original list and a pointer. Combining this with izip() avoids creating the output list as well.

Upvotes: 3

Related Questions