DsCpp
DsCpp

Reputation: 2489

python multithreading with os system, no more than K parallel commands

Regarding the following example:

import os
NUM_CPUS = None  # defaults to all available

def worker(f1, f2):
    os.system("run program x on f1 and f2") <--- Big command, cannot run more that K in parallel

def test_run(pool):
     filelist = os.listdir(files_dir)
     for f1 in filelist:
          for f2 in filelist:
               pool.apply_async(worker, args=(f1, f2))

if __name__ == "__main__":
     import multiprocessing as mp
     pool = mp.Pool(NUM_CPUS)
     test_run(pool)
     pool.close()
     pool.join()

Each os.system call consumes a lot of resources thus I cannot dispatch more than K(5) in parallel. Unfortunately, even when settings NUM_POOLS=5 each pool.apply_async returns immedietly. How can I specify python not to let more than 5 workers run in parallel?

Upvotes: 0

Views: 506

Answers (1)

Hannu
Hannu

Reputation: 12205

Your program looks fine to me.

apply_async() is a non-blocking call but it does not override the pool boundaries. If your pool has five workers, only five tasks are running in parallel. The rest of your tasks are being queued inside the pool and only dispatched when a worker is free.

Your code will block in pool.join() until all workers are finished. If you need your main program to block as well if you have five workers running, then you need to do something else but I cannot imagine why you would want to do that. Pool will handle your scheduling in the background and keep max 5 workers running and you do not need to worry about it.

Upvotes: 3

Related Questions