Reputation: 551
I'm calling a function with large memory overhead on a list of N
files. The reasons for the large memory overhead are due to a number of factors that cannot be resolved without modifying the function, however I have overcome the leaking memory using the multiprocessing
module. By creating a subprocess for each of the N
files, and then calling pool.close()
, the memory from the function is released with minimal overhead. I have achieved this in the following example:
def my_function(n):
do_something(file=n)
return
if __name__ == '__main__':
# Initialize pool
for n in range(0,N,1):
pool = mp.Pool(processes=1)
results = pool.map(my_function,[n])
pool.close()
pool.join()
This does exactly what I want: by setting processes=1
in pool
, one file is run at a time for N
files. After each n
file, I call pool.close()
, which closes the process and releases the memory back to the OS. Before, I didn't use multiprocessing
at all, just a for
loop, and the memory would accumulate until my system crashed.
My questions are
processes>1
), and still have the memory released after each n
?I'm just learning about the multiprocessing
module. I've found many multiprocessing
examples here, but none specific to this problem. I'll appreciate any help I can get.
Upvotes: 0
Views: 372
Reputation: 43495
Is this the correct way to implement this?
"Correct" is a value judgement in this case. One could consider this either a bodge or a clever hack.
Is there a better way to implement this?
Yes. Fix my_function
so that it doesn't leak memory.
If a Python function is leaking lots of memory, chances are you are doing something wrong.
Is there a way to run more than one process at a time (processes>1), and still have the memory released after each n?
Yes. Use the maxtasksperchild
argument when creating a Pool
.
Upvotes: 1