rstober
rstober

Reputation: 1431

How can I get total CPU usage for a Slurm job?

I'm trying to get the total amount of CPU time used by each job. I've found several promising sacct fields but which one should I use?

According to the documentation (https://computing.llnl.gov/linux/slurm/sacct.html), TotalCPU reflects the total of SystemCPU and UserCPU but NOT the child processes. But I want the total including the child processes...

TotalCPU
    The sum of the SystemCPU and UserCPU time used by the job or job step. The total CPU time of the job may exceed the job's elapsed time for jobs that include multiple job steps. The format of the output is identical to that of the elapsed field.

NOTE: TotalCPU provides a measure of the task's parent process and does not include CPU time of child processes.

For the other candidate, cputimeraw doesn't provide the same level of detail:

cputime
    Formatted number of cpu seconds a process was allocated.

cputimeraw
    How much cpu time process was allocated in second format, not formatted like above. 

I'm inclined to use cputimeraw instead of TotalCPU but I want to make sure it's the total including any child processes spawned by the job. The documentation doesn't indicate anything about the child processes one way or the other.

Does anyone have any suggestions?

Thank you,

Robert

Upvotes: 7

Views: 3962

Answers (1)

aurelien
aurelien

Reputation: 868

the following command give a nice summary:

seff jobid

output:

Job ID: jobid
Cluster: cluster
User/Group: doe/clusterusers
State: TIMEOUT (exit code 0)
Nodes: 6
Cores per node: 28
CPU Utilized: 32-01:15:44
CPU Efficiency: 9.54% of 336-00:44:48 core-walltime
Job Wall-clock time: 2-00:00:16
Memory Utilized: 58.76 GB
Memory Efficiency: 8.74% of 672.00 GB

Upvotes: 1

Related Questions