Biggsy
Biggsy

Reputation: 1374

How can I find out the "command" (batch script filename) of a finished SLURM job?

I often have lots of SLURM jobs running from different directories. Therefore, it is useful to query the workdir of the jobs. I can do this for jobs in the queue (e.g. pending, running, etc.) something like this:

squeue -u $USER -o "%i %Z"

and I can do this for finished jobs (e.g. completed, timeout, cancelled, etc.) something like this:

sacct -u $USER -o JobID,WorkDir

The problem is, sometimes I have a directory with two (or more) SLURM batch scripts in it, e.g. submit.sh and restart.sh. Therefore, it is also useful to query the "command" of the jobs, i.e. the filename of the batch script. I can do this for jobs in the queue something like this:

squeue -u $USER -o "%i %o"

However, from checking the documentation of sacct and playing around with sacct, there appears to be no equivalent option for sacct so I cannot currently get the command for finished jobs. I also cannot use the squeue method for finished jobs - it just says slurm_load_jobs error: Invalid job id specified because finished jobs are not included in the squeue list. So, how can I find out the command of a finished SLURM job (using sacct or otherwise)?

Upvotes: 3

Views: 1829

Answers (1)

damienfrancois
damienfrancois

Reputation: 59350

Slurm does not indeed store the command in the accounting database. Two workarounds:

For a single user: use the JobName or Comment to store the script name upon submission. These are stored in the database, but this approach is error-prone;

Cluster-wise: enable job completion plugin to ElastiSearch as this stores not only the script name but the whole contents as well.

Upvotes: 1

Related Questions