Reputation: 2181
I use slurm
to run jobs on a cluster. I would like to get stats about the job, such as used memory, number of processors and wall-time. I would like to get such information in the log file. I think that this was possible with LSF
(if I remember correctly and I am not getting confused with some other platform).
Upvotes: 2
Views: 1614
Reputation: 2076
You can get this information from the Slurm database, see https://slurm.schedmd.com/sacct.html or Find out the CPU time and memory usage of a slurm job. E.g. sacct --jobs=12345 --format=NCPUS,MaxRSS,CPUTime
.
Note: you can add this to the epilog script. Here is an example of epilog.srun
:
#!/bin/sh
TMPDIR="/local"
# Append job usage info to job stdout
stdoutfname=`scontrol show job ${SLURM_JOB_ID} --details | grep "StdOut=" | sed -e 's/.*StdOut=\([^\s][^\s]*\)/\1/'`
if [ -w "${stdoutfname}" ] && [ "${QTMPDIR}" != "" ]; then
sacct --format JobID,jobname,AveCPUFreq,AveDiskRead,AveRSS,cputime,MaxDiskWrite -j ${SLURM_JOB_ID} >> ${stdoutfname}
Alternatively, you can use /usr/bin/time -v <your command>
inside of your script (with full path for time
, see https://stackoverflow.com/a/774601/6352677). That will be in the logs, but will not exactly match Slurm's accounting values.
Upvotes: 2