Ben
Ben

Reputation: 1377

Can I cancel just the array job in slurm?

I have a bunch of jobs running as an array job in slurm:

123_[1-500] PD my_job 0:00 me
123_2       R  my_job 9:99 me
123_3       R  my_job 9:99 me
123_4       R  my_job 9:99 me
123_5       R  my_job 9:99 me
...

As I read the man page on scancel, it seems to indicate that if I execute scancel 123 it will stop everything

Am I wrong, or is there another way to stop just the array job? I want the already running jobs to finish, I just don't want any more jobs created by 123, and I really don't want to have to figure out which jobs need to be re-run if I accidentally kill them mid-way

Upvotes: 11

Views: 4159

Answers (2)

damienfrancois
damienfrancois

Reputation: 59350

You can issue scancel with the additional --state tag:

 scancel --state=PENDING 123

or, in short:

 scancel -t PD 123

That will only cancel jobs of the 123 array that are pending and will leave the running the already started ones.

Upvotes: 18

Ben
Ben

Reputation: 1377

I put a hold on the job: scontrol hold 123 Once all the jobs reported by squeue were done, I was able to cancel it.

Upvotes: 1

Related Questions