Allis Tauri
Allis Tauri

Reputation: 31

Multiprocessing manager process does not free memory

In the app I'm developing I use a multiprocessing.BaseManager to do some heavy and complex computations in parallel with the main process. I use a Manager and not a Pool because these computations are implemented as a class and needed to be performed only once in a while.

Each time I create a new instance of computing class in the manager, call its' methods, get back the results, then delete the instance and call gc.collect() in the manager.

Here's a pseudo-code to demonstrate the situation:

import gc
from multiprocessing.managers import BaseManager

class MyComputer(object):
   def compute(self, args):
      #several steps of computations
      return huge_list

class MyManager(BaseManager): pass
MyManager.register('MyComputer', MyComputer)
MyManager.register('gc_collect', gc.collect)

if __name__ == '__main__':
   manager = MyManager()
   manager.start()

   #obtain args_list from the configuration file

   many_results = []
   for args in args_list:
      comp = manager.MyComputer()
      many_results.append(comp.compute(args))
      del comp
      manager.gc_collect()

   #do somthing with many_results

The result of a computation is big (200Mb-600Mb). And the problem is: according to top, resident memory used by manager process is growing significantly (by 50Mb to 1Gb) after a computation. It grows much faster if a single comp object is used in all computations or if manager.gc_collect() is not called. So I guess the object is indeed deleted and garbage collector works, yet something is still left behind.

Here's a plot of resident memory used by the Manager process during five rounds of computations: https://i.sstatic.net/38tdo.png

My questions are:

  1. Do I need to search for memory leaks in MyComputer implementation, or is this just a feature of python's memory management system?
  2. If the latter is true, are there any means to force a manager process to return it's "freed" memory to the OS?

Upvotes: 1

Views: 1921

Answers (2)

Allis Tauri
Allis Tauri

Reputation: 31

After more than a week of research, I'm answering my own questions:

  1. Described memory usage profile is indeed a feature of Python's memory managment system which does not free memory allocated for small objects. So if amounts of data produced during a calculation are large, it is preferable to preallocate the object that will contain it. NumPy arrays are an option; maybe builtin arrays too.
  2. No, there're no means to do that. More to that: as I've learned, even in C a free() call does not necessarily causes the memory to be returned to the OS.

Another important conclusion of the investigation:

Notice these huge memory spikes (https://i.sstatic.net/38tdo.png). They're much larger than the size of any result (~250Mb) produced. This, it turned out, is due to the fact that they were pickled-unpickled in the process. Pickling is a very expencive process; its memory usage have a non-linear dependence on the size of an object to be pickled. So if you (un)pickle an object ~10Mb large, it uses ~12-13Mb, but (un)pickling of ~250Mb uses 800-1000Mb! Thus, in order to pickle a big object (which includes any usage of Pipes, Queues, Connections, shelves, etc.), you need to serialize the process somehow.

Upvotes: 1

Vor
Vor

Reputation: 35109

It's hard to guess what is the problem. Because memory leaks are always hard to find. I would recommend you to install memory_profiler if you don't have one. It can help you find the memory problem very easily.

Just an example of how to use it:

test.py

@profile                                                                        
def foo():                                                                      
    f = open('CC_2014.csv', 'rb')                                               
    lines_f = f.readlines()*10000                                               
    f.close()                                                                   
    lines_f = None                                                              
                                                                                        
foo()

As you can see I added @profile decorator to the function I suspect has a memory problem. Then run your script like this:

python -m memory_profiler test.py

And the result is:

Line #    Mem usage    Increment   Line Contents
================================================
     1    9.316 MiB    0.000 MiB   @profile
     2                             def foo():
     3    9.316 MiB    0.000 MiB       f = open('CC_2014.csv', 'rb')
     4  185.215 MiB  175.898 MiB       lines_f = f.readlines()*10000
     5  185.211 MiB   -0.004 MiB       f.close()
     6    9.656 MiB -175.555 MiB       lines_f = None

From this output you can easily see which line of eats up a lot of memory.

Upvotes: 0

Related Questions