Sebus
Sebus

Reputation: 430

List job's pending steps

The scenario is this one, I allocate ressources (2 nodes, 64 CPUs) to job with salloc:

salloc -N 1-2 -n 64 -c 1 -w cluster-node[2-3] -m cyclic -t 5
salloc: Granted job allocation 1720

Then, I use srun to create steps to my job:

for i in (seq 70)
    srun --exclusive -N 1 -n 1 --jobid=1720 sleep 60 &
end

Because I created more steps than available cpus for my job, steps are "pending" until a free CPU.

When I use squeue with -s option to list steps, I'm only able to view the running ones.

squeue -s -O stepid:12,stepname:10,stepstate:9
1720.0     sleep     RUNNING
[...]
1720.63     sleep     RUNNING

My question is, does steps have status different from RUNNING like jobs, and if yes, is there a way to view those with squeue (or other command) ?

Upvotes: 1

Views: 254

Answers (1)

damienfrancois
damienfrancois

Reputation: 59360

Not sure Slurm can offer the information. One alternative would be to use GNU Parallel so that jobs steps are not started at all until a CPU is available. In the current setting all job steps are started at once and those which do not have a CPU available are waiting.

So with the same allocation as you use, replace

for i in (seq 70)
    srun --exclusive -N 1 -n 1 --jobid=1720 sleep 60 &
end

with

parallel -P $SLURM_NTASKS srun --exclusive -N 1 -n 1 --jobid=1720 sleep 60

Then the output of squeue should list RUNNING and PENDING steps.

N.B. not sure the --jobid= option is needed here BTW

Upvotes: 1

Related Questions