lifezbeautiful
lifezbeautiful

Reputation: 1337

How to utilise GNU parallel efficiently?

I have a script say parallelise.sh, whose contents are 10 different python calls shown below:

python3.8 script1.py
python3.8 script2.py
.
.
.
python3.8 script10.py

Now, I use GNU parallel nohup parallel -j 5 < parallellise.sh &

It starts as expected; 5 different processors are being used and the first 5 scripts, script_1.py ... script_5.py are running. Now I notice that some of them (say two of them script_1.py and script_2.py) complete very fast, whereas the others need more time to complete.

Now, there are unused resources (2 processors) while waiting for the remaining 3 scripts (script_3.py, script_4.py, and script_5.py) to complete so that the next 5 can be loaded. Is there a way to use these resources by loading new ones as existing commands get completed?

For information: My OS is CentOS

Upvotes: 1

Views: 180

Answers (1)

Ole Tange
Ole Tange

Reputation: 33748

As @RenaudPacalet says there is nothing else to do.

So there is something in your scripts which causes this not to happen.

To help debug you can use:

parallel --lb --tag < parallellise.sh

and maybe add a "Starting X" line at the beginning of scriptX.py and a "Finishing X" line at the end of scriptX.py so you can see that the scripts are indeed finishing.

Without knowing anything about scriptX.py it is impossible to say what is causing this.

(Instead of nohup consider using tmux or screen so you can have the jobs run in the background but always check in on them and see their output. nohup is not ideal for debugging).

Upvotes: 1

Related Questions