Ian
Ian

Reputation: 1166

Permit jobs with run time limit that will end by specified time?

We set all our SLURM nodes to "drain" in preparation for maintenance windows, after which all new jobs stay pending until the nodes resume. We do this well before the maintenance window though, so all running jobs can finish. That wastes quite a bit of cluster time. Is there a way to specify that nodes will only accept batch jobs with a --time=x argument such that the job start time + x would be less than a given time? For example, if maintenance outage is schedule for Friday night, jobs reaching the top of the queue on Wednesday with --time=2-0 would run, but jobs submitted on Thursday with --time=2-0 would not.

Upvotes: 1

Views: 623

Answers (1)

Carles Fenoy
Carles Fenoy

Reputation: 5357

You should probably create a reservation of all the nodes. The following command (untested) should do the trick

scontrol create reservation reservationname="maintenance1" start=03/31T08:00 Duration=10-00 Nodes=ALL Users=root

This will create a reservation for all the nodes only usable by root starting on March 31st for 10 days. This is also good practice as once the maintenance is finished you can submit some jobs to test that the cluster is working as expected.

You can remove a reservation with:

scontrol remove reservationname="maintenance1"

Upvotes: 3

Related Questions