Reputation: 21
I have submitted my job in a linux-cluster(that uses SLURM to schedule job), but the time limit of each partition is only 24hr(actually this limit is set by the admin) and it seems that my code need to run more than a week(as per my guess). I am new to SLURM script and understand a very little about the interplay between the following:
#SBATCH --nodes=
#SBATCH --ntasks-per-node=
#SBATCH --ntasks=
#SBATCH --ntasks-per-core=
I am seeking the way out there to avoid the time limit while submitting job and run my complete job.
Suggestions are appreciated.
Upvotes: 2
Views: 4456
Reputation: 525
For anyone getting here, I would suggest looking at "singleton", I found a good example in the following link, which I am pasting below.
Example taken from https://researchcomputing.princeton.edu/support/knowledge-base/slurm
#!/bin/bash
#SBATCH --job-name=LongJob # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=4G # memory per node (4G per cpu-core is default)
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#SBATCH --dependency=singleton # job dependency
#SBATCH --mail-type=begin # send email when job begins
#SBATCH --mail-type=end # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu
module purge
module load anaconda3/2020.11
conda activate galaxy-env
python myscript.py
Notice the line #SBATCH --dependency=singleton
And then run multiple times like so:
$ sbatch job.slurm # step 1
$ sbatch job.slurm # step 2
$ sbatch job.slurm # step 3
$ sbatch job.slurm # step 4
$ sbatch job.slurm # step 5
Upvotes: 0
Reputation: 528
Time limit is set by admin and that is defined in slurm.conf at /etc/slurm/slurm.conf. There should be partition that defines the limit.
and I am afraid you cannot bypass that limit.
So the only thing that you can do is:
For 1 you need to modify the program and save state which most program should provide if they are supposed to run for long duration?
It seems you are from Nepal and if you happen to run it in Kathmandu University HPC you can ask administration they should help you here.
Regarding your second question:
#SBATCH --nodes=
#SBATCH --ntasks-per-node=
#SBATCH --ntasks=
#SBATCH --ntasks-per-core=
nodes means number of physical node.
For ntask related thing I recommend you to look on this link: What does the --ntasks or -n tasks does in SLURM?
Upvotes: 1