Reputation: 1526
in my bash script I got the following command
ulimit -s unlimited
However, when I launch my job by sbatch job.sh
and then ssh to one of the computer nodes to check the stack size ulimit -a
I clearly see the stack size is:
stack size (kbytes, -s) 8192
This is my full script
#!/bin/bash -l
#SBATCH --job-name=test
#SBATCH --nodes=13
#SBATCH --ntasks-per-node=32
#SBATCH --mem=120GB
#SBATCH --time=999:99:00
#SBATCH --propagate=STACK
ulimit -s unlimited
mpirun ./pres.a
Upvotes: 0
Views: 4179
Reputation: 74445
ulimit
is a shell built-in command. Resource limits set with it are not system-wide and only apply to processes started in the same shell session and their descendants. When you SSH into a node and execute ulimit
, it shows you the limits in that particular shell session, not the limits applied to the processes in the job, even if some of them are running on the same node.
Also, --propagate=STACK
propagates the resource limits of the shell session where you execute the sbatch
command and not the limits set in the job script:
PropagateResourceLimits
A list of comma separated resource limit names. The
slurmd
daemon uses these names to obtain the associated (soft) limit values from the users process environment on the submit node. These limits are then propagated and applied to the jobs that will run on the compute nodes.
Thus, the ulimit -s unlimited
inside the job script only applies to the shell process started by SLURM when the job is executed and unless mpirun
propagates the limits further onto the processes it spawns, they will inherit the system default stack size limit instead. But if you do:
$ ulimit -s unlimited
$ sbatch --propagate=STACK foo.sh
(or have #SBATCH --propagate=STACK
inside foo.sh
as you do), then all processes spawned by SLURM for that job will already have their stack size limit set to unlimited.
Upvotes: 2