Fadecomic
Fadecomic

Reputation: 1268

Running command on master node from qsub submission script

Using Sun Grid Engine, is there a way to run a command on the master node within the qsub submission script? If I run /bin/hostname from within a qsub script, I'm already on one of the queue computers and not the master node. In short, I want to run qstaton the job I just submitted automatically. If I try to run qstat as from one of the worker nodes, I get an error message telling me the worker node is neither a submit nor admin host.

I realize I can do this from outside of the qsub script, but the script defines many useful variables, such as the job name and sge job id.

Upvotes: 0

Views: 1384

Answers (2)

Vince
Vince

Reputation: 3395

If your aim is simply to get details on the submitted job, you may be better off using the environment variables provided by the submitting client ie, available within the job script. See the ENVIRONMENT VARIABLES section of the qsub manual page (man qsub):

ENVIRONMENTAL VARIABLES
     SGE_ROOT       Specifies the location of the Sun Grid Engine
                    standard configuration files.

     SGE_CELL       If set, specifies the default Sun Grid Engine
                    cell. To address a Sun Grid Engine cell qsub,
                    qsh, qlogin or qalter use (in  the  order  of
                    precedence):

                         The name of the cell  specified  in  the
                         environment  variable SGE_CELL, if it is
                         set.

                         The  name  of  the  default  cell,  i.e.
                         default.


     SGE_DEBUG_LEVEL
                    If  set,  specifies  that  debug  information
                    should  be written to stderr. In addition the
                    level of detail in which debug information is
                    generated is defined.

     SGE_QMASTER_PORT
                    If set,  specifies  the  tcp  port  on  which
                    sge_qmaster(8) is expected to listen for com-
                    munication requests.  Most installations will
                    use  a  services  map  entry  for the service
                    "sge_qmaster" instead to define that port.

     DISPLAY        For qsh jobs the DISPLAY has to be  specified
                    at job submission.  If the DISPLAY is not set
                    by using the -display or the -v  switch,  the
                    contents  of the DISPLAY environment variable
                    are used as default.

     In addition to those environment variables specified  to  be
     exported  to the job via the -v or the -V option (see above)
     qsub, qsh, and qlogin add the following variables  with  the
     indicated values to the variable list:


     SGE_O_HOME     the home directory of the submitting client.

     SGE_O_HOST     the name of the host on which the  submitting
                    client is running.

     SGE_O_LOGNAME  the LOGNAME of the submitting client.

     SGE_O_MAIL     the MAIL of the submitting  client.  This  is
                    the mail directory of the submitting client.

     SGE_O_PATH     the executable search path of the  submitting
                    client.

     SGE_O_SHELL    the SHELL of the submitting client.

     SGE_O_TZ       the time zone of the submitting client.

     SGE_O_WORKDIR  the absolute  path  of  the  current  working
                    directory of the submitting client.

     Furthermore, Sun Grid Engine sets additional variables  into
     the job's environment, as listed below.

     ARC

     SGE_ARCH       The Sun Grid Engine architecture name of  the
                    node on which the job is running. The name is
                    compiled-in into the sge_execd(8) binary.

     SGE_CKPT_ENV   Specifies the checkpointing  environment  (as
                    selected with the -ckpt option) under which a
                    checkpointing  job  executes.  Only  set  for
                    checkpointing jobs.

     SGE_CKPT_DIR   Only set  for  checkpointing  jobs.  Contains
                    path  ckpt_dir  (see  checkpoint(5)  ) of the
                    checkpoint interface.

     SGE_STDERR_PATH
                    the pathname of the file to which  the  stan-
                    dard  error  stream  of  the job is diverted.
                    Commonly used for enhancing the  output  with
                    error  messages from prolog, epilog, parallel
                    environment   start/stop   or   checkpointing
                    scripts.

     SGE_STDOUT_PATH
                    the pathname of the file to which  the  stan-
                    dard  output  stream  of the job is diverted.
                    Commonly used for enhancing the  output  with
                    messages   from   prolog,   epilog,  parallel
                    environment   start/stop   or   checkpointing
                    scripts.

     SGE_STDIN_PATH the pathname of the file from which the stan-
                    dard  input  stream of the job is taken. This
                    variable might be used  in  combination  with
                    SGE_O_HOST   in   prolog/epilog   scripts  to
                    transfer the input file from  the  submit  to
                    the execution host.

     SGE_JOB_SPOOL_DIR
                    The  directory  used  by  sge_shepherd(8)  to
                    store  job related data during job execution.
                    This directory is owned by root or by  a  Sun
                    Grid  Engine  administrative account and com-
                    monly is not open for read or write access to
                    regular users.

     SGE_TASK_ID    The index number of  the  current  array  job
                    task (see -t option above). This is an unique
                    number in each array job and can be  used  to
                    reference  different  input data records, for
                    example. This environment variable is set  to
                    "undefined"  for non-array jobs. It is possi-
                    ble to change the predefined  value  of  this
                    variable with -v or -V (see options above).

     SGE_TASK_FIRST The index number of the first array job  task
                    (see  -t  option  above).  It  is possible to
                    change the predefined value of this  variable
                    with -v or -V (see options above).

     SGE_TASK_LAST  The index number of the last array  job  task
                    (see  -t  option  above).  It  is possible to
                    change the predefined value of this  variable
                    with -v or -V (see options above).

     SGE_TASK_STEPSIZE
                    The step size of the array job  specification
                    (see  -t  option  above).  It  is possible to
                    change the predefined value of this  variable
                    with -v or -V (see options above).

     ENVIRONMENT    The ENVIRONMENT variable is set to  BATCH  to
                    identify that the job is being executed under
                    Sun Grid Engine control.

     HOME           The  user's  home  directory  path  from  the
                    passwd(5) file.

     HOSTNAME       The hostname of the node on which the job  is
                    running.

     JOB_ID         A   unique   identifier   assigned   by   the
                    sge_qmaster(8)  when  the  job was submitted.
                    The job ID is a decimal integer in the  range
                    1 to 99999.

     JOB_NAME       The job name. For batch jobs or jobs  submit-
                    ted  by  qrsh with a command, the job name is
                    built as basename of the qsub script filename
                    resp. the qrsh command.  For interactive jobs
                    it is set  to  `INTERACTIVE'  for  qsh  jobs,
                    `QLOGIN'  for  qlogin  jobs and `QRLOGIN' for
                    qrsh jobs without a command.

                    This default may be overwritten  by  the  -N.
                    option.

     JOB_SCRIPT     The path to the job script which is executed.
                    The value can not be overwritten by the -v or
                    -V option.

     LOGNAME        The user's  login  name  from  the  passwd(5)
                    file.

     NHOSTS         The number of hosts in use by a parallel job.

     NQUEUES        The number of queues allocated  for  the  job
                    (always 1 for serial jobs).

     NSLOTS         The number of queue slots in use by a  paral-
                    lel job.

     PATH           A default shell search path of:
                    /usr/local/bin:/usr/ucb:/bin:/usr/bin

     SGE_BINARY_PATH
                    The path where the Sun Grid  Engine  binaries
                    are installed. The value is the concatenation
                    of   the    cluster    configuration    value
                    binary_path   and   the   architecture   name
                    $SGE_ARCH environment variable.

     PE             The parallel environment under which the  job
                    executes (for parallel jobs only).

     PE_HOSTFILE    The path of a file containing the  definition
                    of the virtual parallel machine assigned to a
                    parallel job by  Sun  Grid  Engine.  See  the
                    description  of the $pe_hostfile parameter in
                    sge_pe(5) for details on the format  of  this
                    file. The environment variable is only avail-
                    able for parallel jobs.

     QUEUE          The name of the cluster queue  in  which  the
                    job is running.

     REQUEST        Available for batch jobs only.

                    The request name of a job as  specified  with
                    the  -N  switch  (see  above) or taken as the
                    name of the job script file.

     RESTARTED      This variable is set to 1 if a job  was  res-
                    tarted either after a system crash or after a
                    migration in case of a checkpointing job. The
                    variable has the value 0 otherwise.

     SHELL          The user's login  shell  from  the  passwd(5)
                    file. Note: This is not necessarily the shell
                    in use for the job.

     TMPDIR         The absolute  path  to  the  job's  temporary
                    working directory.

     TMP            The same as TMPDIR; provided for  compatibil-
                    ity with NQS.

     TZ             The  time   zone   variable   imported   from
                    sge_execd(8) if set.

     USER           The user's  login  name  from  the  passwd(5)
                    file.

     SGE_JSV_TIMEOUT
                    If the response time of  the  client  JSV  is
                    greater than this timeout value, then the JSV
                    will attempt to be  re-started.  The  default
                    value  is  10 seconds, and this value must be
                    greater than  0.  If  the  timeout  has  been
                    reached,  the  JSV  will only try to re-start
                    once, if the  timeout  is  reached  again  an
                    error will occur.

Upvotes: 1

clusterdude
clusterdude

Reputation: 623

The client commands must be accessible from the node where the job runs. You can try supplying the full path to qstat, which may match where it resides on the head node. If not found, you'll have to install it on the compute nodes (or ask the admin to do that).

Edit: some admins don't like to allow this, since "qstat spam" may overload the server, on a busy enough system. If you can call, do so mindfully, being polite and not calling it every few seconds.

Upvotes: 0

Related Questions