Finncent Price
Finncent Price

Reputation: 827

Units for Slurm sinfo CPU Load

I can see the "CPU load" for all the nodes on a cluster using the sinfo command:

sinfo --Node --format="%10N %.6D %10P %10T %20E %.4c %.8z %8O %.6m %10e %.6w %.60f"

The %8O in there asks for a measure of the CPU load that is eight characters wide. The problem I have with this command is that the manual doesn't say what the units are. It just says "CPU load of a node." Is this in percent? Number of processes per CPU? Number of processes per thread? Equivalent number of fully committed CPUs/threads?

Upvotes: 0

Views: 2015

Answers (1)

damienfrancois
damienfrancois

Reputation: 59260

In a Linux context, the CPU load has a specific definition related to the number of processes being executed, or pending execution (i.e. requesting CPU but not having access to it.)

Often, the load average over a short period of time is considered ; that is why fractional number can be reported by the various commands that compute the load.

A load of 0 means no activity, a load of 1 means the equivalent of 1 CPU core being 100% active during the considered period, or two cores being 50% active, etc. And it can be due to one process being CPU bound, or two processes being I/O bound for instance.

A load larger than the number of CPU cores in the machine indicates that many process are fighting for CPU resources and context switching occur.

A load equal to the number of CPU cores in the machine indicates all cores are busy 100% of the time, and this is what is expected on HPC clusters. Most of the time in that case, processes are pinned to their "own" core

See here for more information.

Upvotes: 2

Related Questions