Reputation: 3912
This may be a cluster specific issue that can only be addressed by an admin, but when I have a low priority job and a high priority one comes along, the process is killed.
When the high priority job finishes, the low priority job is restarted. Is there a way on the user end to make it suspend on the machine it was originally started on via SIGSTOP or something without killing the process? Unfortunately, checkpointing is not an option here so I would like to be able to hold the job without throwing away what's in memory.
We do have ssh to this machine, so if all else fails, I'm tempted just to do a really sloppy scripting hack to get the desired behavior:
1. start the process locally
2. send a SIGSTOP
3. make the job script send SIGCONT and just spin watching the process
4. when the job gets suspended, send a SIGSTOP again
5. when the job gets resumed, it should just send a SIGCONT
but I would much rather do everything within SGE to avoid any nasty surprises
Upvotes: 0
Views: 1622
Reputation: 74475
The suspend/stop mechanism in SGE is controlled on per queue basis by the properties suspend_method
, resume_method
and terminate_method
. The defaults are:
suspend_method
- send SIGSTOPresume_method
- send SIGCONTterminate_method
- send SIGKILLOther than messing with the default values I can see no other reason for SGE to kill the jobs instead of stop them.
Upvotes: 1