Alex Rothberg
Alex Rothberg

Reputation: 10993

Purging Dead Nodes from SGE

My qstat -g c indicates that I have some dead nodes (formally 'cdsuE'):

CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q                             0.11     18      0      9     37      0     10 

Is there an easy way to purge or remove these nodes from the queue?

SGE is smart enough to not allocate work to them but they do clutter up various displays.

Upvotes: 0

Views: 1925

Answers (2)

William Hay
William Hay

Reputation: 2308

If you just want to remove from the queue then removing them from the queue with:

qconf -dattr queue hostlist <nodename> all.q

or if they're incorporated via a hostgroup

qconf -dattr hostgroup hostlist <nodename> <hostgroup>

This does the minimum needed to get them out of the queue but makes it easy to add them back if you manage to resurect them later.

If there are any ghost jobs on the node then use qdel -f to get rid of them

Upvotes: 0

Finch_Powers
Finch_Powers

Reputation: 3106

I do it the hardway.

  1. Kill the jobs "running" or stuck on dead nodes.
  2. Run the qconf remove node pipeline

-

qconf -dattr hostgroup hostlist <nodealias> @allhosts'
qconf -purge queue slots all.q@<nodealias>
qconf -dconf <nodealias>
qconf -de <nodealias>

Upvotes: 1

Related Questions