ulimit stack size through slurm script

Question

in my bash script I got the following command

 ulimit -s unlimited

However, when I launch my job by sbatch job.sh and then ssh to one of the computer nodes to check the stack size ulimit -a I clearly see the stack size is:

  stack size              (kbytes, -s) 8192

This is my full script

#!/bin/bash -l
#SBATCH --job-name=test
#SBATCH --nodes=13
#SBATCH --ntasks-per-node=32
#SBATCH --mem=120GB
#SBATCH --time=999:99:00
#SBATCH --propagate=STACK

 ulimit -s unlimited
mpirun ./pres.a

Hristo Iliev · Accepted Answer

ulimit is a shell built-in command. Resource limits set with it are not system-wide and only apply to processes started in the same shell session and their descendants. When you SSH into a node and execute ulimit, it shows you the limits in that particular shell session, not the limits applied to the processes in the job, even if some of them are running on the same node.

Also, --propagate=STACK propagates the resource limits of the shell session where you execute the sbatch command and not the limits set in the job script:

PropagateResourceLimits

A list of comma separated resource limit names. The slurmd daemon uses these names to obtain the associated (soft) limit values from the users process environment on the submit node. These limits are then propagated and applied to the jobs that will run on the compute nodes.

Thus, the ulimit -s unlimited inside the job script only applies to the shell process started by SLURM when the job is executed and unless mpirun propagates the limits further onto the processes it spawns, they will inherit the system default stack size limit instead. But if you do:

$ ulimit -s unlimited
$ sbatch --propagate=STACK foo.sh

(or have #SBATCH --propagate=STACK inside foo.sh as you do), then all processes spawned by SLURM for that job will already have their stack size limit set to unlimited.

ulimit stack size through slurm script

Answers (1)

Related Questions