Gorka
Gorka

Reputation: 2079

Cancel jobs submitted previous to a date or with JOBID lower than a given integer

I have realized that the jobs submitted with a previous version of my software are useless because of a bug, so I want to cancel them. However, I also have newer jobs that I would like to keep running. All the jobs have the same job name and are running in the same partition.

I have written the following script to cancel the jobs with an ID lower than a given one.

#!\bin\bash

if [ $1 ]
then
    MAX_JOBID=$1
else
    echo "An integer value is needed"
    exit
fi

JOBIDLIST=$(squeue -u $USER -o "%F")

for JOBID in $JOBIDLIST
do
    if [ "$JOBID" -lt "$MAX_JOBID" ]
    then
        echo "Cancelling job "$JOBID
        scancel $JOBID
    fi
done

I would say that this is a recurrent situation for someone developing a software and I wonder if there is a direct way to do it using slurm commands. Alternatively, do you use some tricks like appending the software commit ID to the job name to overcome this kind of situations?

Upvotes: 1

Views: 849

Answers (2)

damienfrancois
damienfrancois

Reputation: 59360

In addition to the suggestions by @j23, you can organise your jobs with

Upvotes: 1

j23
j23

Reputation: 3530

Unfortunately there is no direct way to cancel the job in such scenarios.

Alternatively, like you pointed out, naming the job by adding software version/commit along with job name is useful. In that case you can use, scancel --name=JOB_NAME_VERSION to cancel all the jobs with that job name.

Also, if newly submitted jobs can be hold using scontrol hold <jobid> and then all the PENDING job can be cancelled using scancel --state=PENDING

In my case, I used a similar approach (like yours) by having squeue piped the output to awk and cancelled the first N number of jobs I wanted to remove. Its a one-liner script.

Something like this:

eg: squeue arguments | awk 'NR>=2 && NR<=N{print $1}' | xargs /usr/bin/scancel

Upvotes: 3

Related Questions