Multiprocessing manager process does not free memory

Question

In the app I'm developing I use a multiprocessing.BaseManager to do some heavy and complex computations in parallel with the main process. I use a Manager and not a Pool because these computations are implemented as a class and needed to be performed only once in a while.

Each time I create a new instance of computing class in the manager, call its' methods, get back the results, then delete the instance and call gc.collect() in the manager.

Here's a pseudo-code to demonstrate the situation:

import gc
from multiprocessing.managers import BaseManager

class MyComputer(object):
   def compute(self, args):
      #several steps of computations
      return huge_list

class MyManager(BaseManager): pass
MyManager.register('MyComputer', MyComputer)
MyManager.register('gc_collect', gc.collect)

if __name__ == '__main__':
   manager = MyManager()
   manager.start()

   #obtain args_list from the configuration file

   many_results = []
   for args in args_list:
      comp = manager.MyComputer()
      many_results.append(comp.compute(args))
      del comp
      manager.gc_collect()

   #do somthing with many_results

The result of a computation is big (200Mb-600Mb). And the problem is: according to top, resident memory used by manager process is growing significantly (by 50Mb to 1Gb) after a computation. It grows much faster if a single comp object is used in all computations or if manager.gc_collect() is not called. So I guess the object is indeed deleted and garbage collector works, yet something is still left behind.

Here's a plot of resident memory used by the Manager process during five rounds of computations: https://i.sstatic.net/38tdo.png

My questions are:

Do I need to search for memory leaks in MyComputer implementation, or is this just a feature of python's memory management system?
If the latter is true, are there any means to force a manager process to return it's "freed" memory to the OS?

Allis Tauri · Accepted Answer

After more than a week of research, I'm answering my own questions:

Described memory usage profile is indeed a feature of Python's memory managment system which does not free memory allocated for small objects. So if amounts of data produced during a calculation are large, it is preferable to preallocate the object that will contain it. NumPy arrays are an option; maybe builtin arrays too.
No, there're no means to do that. More to that: as I've learned, even in C a free() call does not necessarily causes the memory to be returned to the OS.

Another important conclusion of the investigation:

Notice these huge memory spikes (https://i.sstatic.net/38tdo.png). They're much larger than the size of any result (~250Mb) produced. This, it turned out, is due to the fact that they were pickled-unpickled in the process. Pickling is a very expencive process; its memory usage have a non-linear dependence on the size of an object to be pickled. So if you (un)pickle an object ~10Mb large, it uses ~12-13Mb, but (un)pickling of ~250Mb uses 800-1000Mb! Thus, in order to pickle a big object (which includes any usage of Pipes, Queues, Connections, shelves, etc.), you need to serialize the process somehow.

Multiprocessing manager process does not free memory

Answers (2)

test.py

Related Questions