PBS array job parallelization

Question

I am trying to submit a job on high compute cluster that needs to run a python code lets say 10000 times. I used gnu parallel but then IT team sent me a mail stating that my job is creating too many ssh login logs in their monitoring system. They asked me to use job arrays instead. My code takes about 12 seconds to run. I believe I need to use #PBS -J statement in my PBS script. Then, I am not sure if it will run in parallel. I need to execute my code lets say on 10 nodes 16 cores each i.e. 160 instances of my code running in parallel. How can I parallelize it i.e. run many instances of my code at a given time utilizing all the resources I have? Below is the initial pbs script with gnu parallel:

#!/bin/bash
#PBS -P My_project
#PBS -N my_job
#PBS -l select=10:ncpus=16:mem=4GB
#PBS -l walltime=01:30:00

module load anaconda
module load parallel

cd $PBS_O_WORKDIR

JOBSPERNODE=16

parallel --joblog jobs.log --wd $PBS_O_WORKDIR -j $JOBSPERNODE --sshloginfile $PBS_NODEFILE --env PATH "python $PBS_O_WORKDIR/xyz.py" :::: inputs.txt

inputs.txt is a fie with integer values 0-9999 in each line which is fed to my python code as an argument. Code is highly independent and output of one instance does not affect another.

PBS array job parallelization

Answers (1)

Related Questions