How to call a linux command line program in parallel with python

Question

I have a command-line program which runs on single core. It takes an input file, does some calculations, and returns several files which I need to parse to store the produced output. I have to call the program several times changing the input file. To speed up the things I was thinking parallelization would be useful. Until now I have performed this task calling every run separately within a loop with the subprocess module.

I wrote a script which creates a new working folder on every run and than calls the execution of the program whose output is directed to that folder and returns some data which I need to store. My question is, how can I adapt the following code, found here, to execute my script always using the indicated amount of CPUs, and storing the output. Note that each run has a unique running time. Here the mentioned code:

import subprocess
import multiprocessing as mp
from tqdm import tqdm

NUMBER_OF_TASKS = 4
progress_bar = tqdm(total=NUMBER_OF_TASKS)

def work(sec_sleep):
    command = ['python', 'worker.py', sec_sleep]
    subprocess.call(command)


def update_progress_bar(_):
    progress_bar.update()


if __name__ == '__main__':
    pool = mp.Pool(NUMBER_OF_TASKS)

    for seconds in [str(x) for x in range(1, NUMBER_OF_TASKS + 1)]:
         pool.apply_async(work, (seconds,), callback=update_progress_bar)

    pool.close()
    pool.join()

How to call a linux command line program in parallel with python

Answers (1)

Related Questions