Reputation: 2647
I have an expensive function to perform on many independent objects that is trivially parallel, so I'm trying to use the multiprocessing module. However, the memory consumption seems to be on a runaway up-and-to-the-right trajectory. See the attached image below.
Essentially I have a list of paths to large binary objects. I have a class that I instantiate with this list. In this class's __iter__
method, I read the file from disk and yield it. The idea is that I iterate through this list of objects (which reads the file into memory) and perform some expensive operation. Below is some sample code to simulate this. I'm using np.random.rand(100,100)
to simulate the reading of the large file into memory, and I'm only just indexing the [0,0] element if the matrix in the simulated expensive function.
import numpy as np
from pathos.multiprocessing import ProcessingPool as Pool
from memory_profiler import profile
class MyClass:
def __init__(self, my_list):
self.name = 'foo'
self.my_list = my_list
def __iter__(self):
for item in self.my_list:
yield np.random.rand(100,100)
def expensive_function(foo):
foo[0,0]
my_list = range(100000)
myclass = MyClass(my_list)
iter(myclass) # should not return anything
p = Pool(processes=4, maxtasksperchild=50)
p.map(expensive_function, iter(myclass), chunksize=100)
The issue can be seen in the plot. The memory consumption just seems to climb and climb. I would expect the total memory consumption to be ~4x the consumption of each individual child process, but that doesn't seem to be the case.
What's causing this runaway memory usage, and how do I fix it?
Upvotes: 0
Views: 187
Reputation: 2414
Each time that a child begins to invoke expensive_function
, it's receiving a new np.random.rand(100,100)
array from MyClass.__iter__
. These arrays are persisting in the main process, so of course the memory usage continues to grow; the child processes aren't able to clean these up, they exist in the parent process. Note how the peak is a little under 8 GiB, or about how much data you should expect to generate (100000 arrays with 100x100 entries, 8 bytes per entry)
Upvotes: 1