rinspy
rinspy

Reputation: 386

Forcing memory to release after running a function

I use a module (that I cannot modify) which contains a method that I need to use. This method returns 10GB of data, but also allocates 8GB of memory that it does not release. I need to use this method at the start of a script that runs for a long time, and I want to make sure the 8GB of memory are released after I run the method. What are my options here?

To be clear, the 8GB do not get reused by the script - i.e. if I create a large numpy array after running the method, extra memory is allocated for that numpy array.

I have considered running the method in a separate process using the multiprocessing module (and returning the result), but run into problems serializing the large result of the method - 10GB cannot be pickled by the default pickler, and even if I force multiprocessing to use pickle version 4 pickling has a very large memory overhead. Is there anything else I could do without being able to modify the offending module?

Edit: here is an example

from dataloader import dataloader1
result = dataloader1.get("DATA1")

As I understand it, dataloader is a Python wrapper around some C++ code using pybind11. I do not know much more about its internal workings. The code above results in 18GB being used. If I then run

del result

10GB gets freed up correctly, but 8GB continues being used (with seemingly no python objects existing any more).

Edit2: If I create a smallish numpy array (e.g. 3GB), memory usage stays at 8GB. If I delete it and instead create a 6GB numpy array, memory usage goes to 14GB and comes back down to 8GB after I delete it. I still need the 8GB released to the OS.

Upvotes: 4

Views: 1102

Answers (3)

Simon Kocurek
Simon Kocurek

Reputation: 2166

Python uses 2 different mechanisms to free memory.

  1. Reference Counting which is employed primarily and deallocates memory as soon as it is no longer needed (eg. object lost from scope).

  2. Garbage Collector, which is secondary and is used to collect objects with cyclic references (a -> b -> c -> a). This can be triggered using a method. Otherwise Python itself will decide, when to free memory.

However I would highly suggest profiling and chaning the code so that it does not use as much memory. Perhaps look into streams, or use a database.

Upvotes: 0

dtrckd
dtrckd

Reputation: 662

If the memory is not released by th gc, it is probably because an object is store in the class that created it, so an option is to find what is this big attribute in the class (by profiling) instance and assigned it to None which may cause the gc to release the memory.

Upvotes: 0

Christian Sauer
Christian Sauer

Reputation: 10889

can you modify the function? If the memory is held by some module, try to reload that module, (importlib.reload) which should release the memory.

Upvotes: 2

Related Questions