irritable_phd_syndrome
irritable_phd_syndrome

Reputation: 5067

How does slurm determine memory usage of jobs

Recently a user was running an interactive job on our cluster. We use slurm as the workload manager. He got his allocation via :

 salloc --cpus-per-task=48 --time=14-0 --partition=himem

This requests an entire high memory (1.5TB) machine on our cluster. He ran his job. While it was running, on his screen he got the error message (or something like this):

salloc: Error memory limit exceeded

I logged into the node and, using top, his job was only taking 310GB in RES. However within the slurmd.log there is a slew of errors (spanning 8 hours!) like this:

[2017-08-03T23:21:55.200] [398692.4294967295] Step 398692.4294967295 exceeded memory limit (1588997632 > 1587511296), being killed

QUESTION: Why does top think that he's using 310GB while slurm thinks he is using 1.58TB?

Upvotes: 0

Views: 2246

Answers (1)

damienfrancois
damienfrancois

Reputation: 59072

To answer the question, Slurm uses /proc/<pid>/stat to get the memory values. In your case, you were not able to witness the incriminated process probably as it was killed by Slurm, as suggested by @Dmitri Chubarov.

Another possibility is that you have met a Slurm bug which was corrected just recently in version 17.2.7. From the change log:

-- Increase buffer to handle long /proc//stat output so that Slurm can read correct RSS value and take action on jobs using more memory than requested.

The fact that Slurm repeatedly tried to kill the process (you mentioned several occurrences of the entry in the logs) indicates that the machine was running low on RAM and the slurmd was facing issues while trying to kill the process. I suggest you activate cgroups for task control ; it is much more robust.

Upvotes: 1

Related Questions