Reputation: 2784
In PBS, one can query a specific job with qstat -f
and obtain (all?) info and details to reproduce the job:
# qstat -f 1234
Job Id: 1234.login
Job_Name = job_name_here
Job_Owner = user@pbsmaster
...
Resource_List.select = 1:ncpus=24:mpiprocs=24
Resource_List.walltime = 23:59:59
...
Variable_List = PBS_O_HOME=/home/user,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=user,...
etime = Mon Apr 20 16:38:27 2020
Submit_arguments = run_script_here --with-these flags
How may I extract the same information from SLURM?
scontrol show job %j
only works for currently running jobs or those terminated up to 5 minutes ago.
Edit: I'm currently using the following to obtain some information, but it's not as complete as a qstat -f
:
sacct -u $USER \
-S 2020-05-13 \
-E 2020-05-15 \
--format "Account,JobID%15,JobName%20,State,ExitCode,Submit,CPUTime,MaxRSS,ReqMem,MaxVMSize,AllocCPUs,ReqTres%25"
.. usually piped into |(head -n 2; grep -v COMPLETED) |sort -k12
to inspect only failed runs.
Upvotes: 3
Views: 6652
Reputation: 3701
You can get a list of all jobs that started before a certain date like so:
sacct --starttime 2020-01-01
Then pick the job you are interested (e.g. job 1234) and print details with sacct:
sacct -j 1234 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
See here under --helpformat
for a complete list of available fields.
Upvotes: 5