florence-y
florence-y

Reputation: 871

Is there a Sun Grid Engine / queuing / batch job management command that pauses job submissions at a particular time point then restarts them later?

I apologize in advance if this is a naïve question as I'm still fairly new to using commands like qstat to manage jobs on a queue.

I have several hundred jobs that I want to run but vary in how long the processes take to completion. My cluster will be going into a period of maintenance for an 8 hour period at a specific date, and I am curious if there is a way to "pause" the jobs still running at a particular timepoint, then once maintenance on my cluster is done, restart it again.

Some commands I have found include qmod, and qhold then qrls.

Is there anything I am missing that can allow me to do this?

Thanks in advance!

Upvotes: 0

Views: 74

Answers (1)

licens
licens

Reputation: 26

The feature you are looking for is "suspend".

If you suspend a node qmod -sq <queue>@<node>, the jobs on that node will pause. You can then resume them with qmod -usq <queue>@<node>.

If you do not have appropriate permissions to suspend queues you can suspend individual jobs with qmod -sj <jobID>.

Note: If the nodes are restarted during the maintenance any job on that node, running or suspended will be killed.

These commands run when executed. You can use "at" to schedule the command to run automatically at a specific time in the future.

Upvotes: 0

Related Questions