Reputation: 386
I use a module (that I cannot modify) which contains a method that I need to use. This method returns 10GB of data, but also allocates 8GB of memory that it does not release. I need to use this method at the start of a script that runs for a long time, and I want to make sure the 8GB of memory are released after I run the method. What are my options here?
To be clear, the 8GB do not get reused by the script - i.e. if I create a large numpy array after running the method, extra memory is allocated for that numpy array.
I have considered running the method in a separate process using the multiprocessing module (and returning the result), but run into problems serializing the large result of the method - 10GB cannot be pickled by the default pickler, and even if I force multiprocessing to use pickle version 4 pickling has a very large memory overhead. Is there anything else I could do without being able to modify the offending module?
Edit: here is an example
from dataloader import dataloader1
result = dataloader1.get("DATA1")
As I understand it, dataloader is a Python wrapper around some C++ code using pybind11. I do not know much more about its internal workings. The code above results in 18GB being used. If I then run
del result
10GB gets freed up correctly, but 8GB continues being used (with seemingly no python objects existing any more).
Edit2: If I create a smallish numpy array (e.g. 3GB), memory usage stays at 8GB. If I delete it and instead create a 6GB numpy array, memory usage goes to 14GB and comes back down to 8GB after I delete it. I still need the 8GB released to the OS.
Upvotes: 4
Views: 1102
Reputation: 2166
Python uses 2 different mechanisms to free memory.
Reference Counting which is employed primarily and deallocates memory as soon as it is no longer needed (eg. object lost from scope).
Garbage Collector, which is secondary and is used to collect objects with cyclic references (a -> b -> c -> a
). This can be triggered using a method. Otherwise Python itself will decide, when to free memory.
However I would highly suggest profiling and chaning the code so that it does not use as much memory. Perhaps look into streams, or use a database.
Upvotes: 0
Reputation: 662
If the memory is not released by th gc, it is probably because an object is store in the class that created it, so an option is to find what is this big attribute in the class (by profiling) instance and assigned it to None
which may cause the gc to release the memory.
Upvotes: 0
Reputation: 10889
can you modify the function? If the memory is held by some module, try to reload that module, (importlib.reload) which should release the memory.
Upvotes: 2