mikeraf
mikeraf

Reputation: 91

sbatch binds different jobs to a single core

We use Slurm resource manager to send jobs to the cluster. Recently, we upgraded the Slurm version from 15 to 18.

Since the upgrade I encounter the following problem:
I consequently send jobs that require single core and should utilize ~100% cpu. However, when those jobs arrive to the same computing node, it seems that they roughly share a single core. I.e, when the 1st job arrive it gets 100% cpu, when the 2nd arrives they both get 50% etc. etc. Sometimes there are 20 jobs on the same node (it has 24 physical cores) and each get ~5% cpu.

The setup that reproduce the problem is very simple:
The executable is a simple C busy loop that was verified to consume ~100% cpu when run locally.
The script file that I send is:

> cat my.sh
#/bin/bash
/path/to/busy_loop

The sbatch command is:

sbatch -n1 -c1 my.sh

Some observations:

I didn't find any reference to a similar problem over the web and every reference or help will be very much appreciated.

Upvotes: 0

Views: 390

Answers (1)

mikeraf
mikeraf

Reputation: 91

After trying different changes in slurm.conf the change that solved the problem was adding the line:
TaskPlugin=task/affinity

Upvotes: 0

Related Questions