Reputation: 2117
I have an array of data to handle and handler that executing long (1-2 minutes) and takes a lot of memory for its calculations.
raw = ['a', 'b', 'c']
def handler():
# do something long
Since handler requires a lot of memory, I want to execute it in separate subprocess and kill it after execution to release memory. Something like the following snippet:
from multiprocessing import Process
for r in raw:
process = Process(target=handler, args=(r))
process.start()
The problem is that such approach leads to immediate running len(raw)
processes. And it's not good.
Also, it's not needed to interchange any kind of data between subprocesses. Just run them consequently.
Therefore it would be great to run a few processes at the same time and add a new one once existing finishes.
How could it be implemented (if it's even possible)?
Upvotes: 4
Views: 4223
Reputation: 140307
to run your processes sequentially, just join
each process within the loop:
from multiprocessing import Process
for r in raw:
process = Process(target=handler, args=(r))
process.start()
process.join()
that way you're sure that only one process is running at the same time (no concurrency)
That's the simplest way. To run more than one process but limit the number of processes running at the same time, you can use a multiprocessing.Pool
object and apply_async
I've built a simple example which computes the square of the argument, and simulates an heavy processing:
from multiprocessing import Pool
import time
def target(r):
time.sleep(5)
return(r*r)
raw = [1,2,3,4,5]
if __name__ == '__main__':
with Pool(3) as p: # 3 processes at a time
reslist = [p.apply_async(target, (r,)) for r in raw]
for result in reslist:
print(result.get())
Running this I get:
<5 seconds wait, time to compute the results>
1
4
9
<5 seconds wait, 3 processes max can run at the same time>
16
25
Upvotes: 4