gnu parallel as a job queue

Question

I have sets of jobs and all of the jobs can be run in parallel so I want to parallelize them for better throughput.

This is what I am currently doing: I wrote a python script using multiprocessing library that runs jobs in a set at the same time. After all of the jobs in a set is finished, then another set of jobs (script) will be invoked. It is inefficient because each of job in a set has different execution time.

Recently, I noticed about GNU parallel and I think it may help to improve my script. However, a set of jobs have some pre-processing and post-processing tasks thus it is impossible to run random job.

In summary, I want to 1) make sure that pre-processing is completed before launching a job and 2) run post-processing after the jobs in a set are all completed.

And this is what I am trying to do:

Run separate script for each set of job.
Run pre-processing in script for each set and now it is free to run all jobs.
Each script registers jobs into job queue in GNU parallel.
GNU parallel runs job in a queue in parallel.
Each script monitors their own job is finished or not.
When all of the job in a set is done, run post-processing.

I am wondering how can I do such thing with GNU parallel or even not sure that GNU parallel is a write tool for this.

gnu parallel as a job queue

Answers (1)

Related Questions