Reputation: 2489
Regarding the following example:
import os
NUM_CPUS = None # defaults to all available
def worker(f1, f2):
os.system("run program x on f1 and f2") <--- Big command, cannot run more that K in parallel
def test_run(pool):
filelist = os.listdir(files_dir)
for f1 in filelist:
for f2 in filelist:
pool.apply_async(worker, args=(f1, f2))
if __name__ == "__main__":
import multiprocessing as mp
pool = mp.Pool(NUM_CPUS)
test_run(pool)
pool.close()
pool.join()
Each os.system
call consumes a lot of resources thus I cannot dispatch more than K(5
) in parallel.
Unfortunately, even when settings NUM_POOLS=5
each pool.apply_async
returns immedietly.
How can I specify python not to let more than 5 workers run in parallel?
Upvotes: 0
Views: 506
Reputation: 12205
Your program looks fine to me.
apply_async()
is a non-blocking call but it does not override the pool boundaries. If your pool has five workers, only five tasks are running in parallel. The rest of your tasks are being queued inside the pool and only dispatched when a worker is free.
Your code will block in pool.join()
until all workers are finished. If you need your main program to block as well if you have five workers running, then you need to do something else but I cannot imagine why you would want to do that. Pool will handle your scheduling in the background and keep max 5 workers running and you do not need to worry about it.
Upvotes: 3