Reputation: 155
I just wrote a python script which involves multi-threading, something like:
python myScript.py -cpu_n 5 -i input_file
To run the command for my hundreds of input files, I am generating a list (commands.list) of commands for each one:
python myScript.py -cpu_n 5 -i input_file1
python myScript.py -cpu_n 5 -i input_file2
python myScript.py -cpu_n 5 -i input_file3
...
And I'm trying to schedule them with the command 'parallel' and 10 CPUs of three different machines:
parallel -S 10/$server1 -S 10/$server2 -S 10/$server3 < commands.list
My question is: what is the max number of CPUs will be used on each server with the parallel command? Will it be 5*10=50 or just 10 cpus?
Upvotes: 3
Views: 1947
Reputation: 33685
From man parallel
:
--jobs N
-j N
--max-procs N
-P N Number of jobslots on each machine. Run up to N
jobs in parallel. 0 means as many as possible.
Default is 100% which will run one job per CPU
core on each machine.
-S
[@hostgroups/][ncpu/]sshlogin[,[@hostgroups/][ncpu/]sshlogin[,...]]
:
GNU parallel will determine the number of CPU
cores on the remote computers and run the number
of jobs as specified by -j. If the number ncpu
is given GNU parallel will use this number for
number of CPU cores on the host. Normally ncpu
will not be needed.
So your command will run up to 10 jobs on each server in parallel.
Whether each of your commands will use 5 CPU cores is unclear. If each of your commands use 5 cores, 50 cores per server will be used, and in this case I will recommend you do not use the ncpu/server
syntax, but instead use:
parallel -j 20% -S $server1,$server2,$server3 < commands.list
This way you can mix servers that have different number of cores, and GNU Parallel will start 1/5th of that in parallel.
Upvotes: 2