Reputation: 871
I apologize in advance if this is a naïve question as I'm still fairly new to using commands like qstat
to manage jobs on a queue.
I have several hundred jobs that I want to run but vary in how long the processes take to completion. My cluster will be going into a period of maintenance for an 8 hour period at a specific date, and I am curious if there is a way to "pause" the jobs still running at a particular timepoint, then once maintenance on my cluster is done, restart it again.
Some commands I have found include qmod
, and qhold
then qrls
.
Is there anything I am missing that can allow me to do this?
Thanks in advance!
Upvotes: 0
Views: 74
Reputation: 26
The feature you are looking for is "suspend".
If you suspend a node qmod -sq <queue>@<node>
, the jobs on that node will pause. You can then resume them with qmod -usq <queue>@<node>
.
If you do not have appropriate permissions to suspend queues you can suspend individual jobs with qmod -sj <jobID>
.
Note: If the nodes are restarted during the maintenance any job on that node, running or suspended will be killed.
These commands run when executed. You can use "at" to schedule the command to run automatically at a specific time in the future.
Upvotes: 0