Reputation: 51
I am facing issue when starting slurmd service on my compute nodes.
× slurmd.service - Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2022-10-12 04:10:25 EDT; 7s ago Process: 5839 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS (code=exited, status=1/FAILURE) Main PID: 5839 (code=exited, status=1/FAILURE) CPU: 3ms Oct 12 04:10:25 compute1.ghpcv3.au.dk systemd[1]: Started Slurm node daemon. Oct 12 04:10:25 compute1.ghpcv3.au.dk systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE Oct 12 04:10:25 compute1.ghpcv3.au.dk systemd[1]: slurmd.service: Failed with result 'exit-code'.
#slurmd -D -vv slurmd: debug: Log file re-opened slurmd: debug: CPUs:1 Boards:1 Sockets:1 CoresPerSocket:1 ThreadsPerCore:1 slurmd: error: Couldn't find the specified plugin name for cgroup/v2 looking at all files slurmd: error: cannot find cgroup plugin for cgroup/v2 slurmd: error: cannot create cgroup context for cgroup/v2 slurmd: error: Unable to initialize cgroup plugin slurmd: error: slurmd initialization failed
What I missing?
Upvotes: 5
Views: 5699
Reputation: 463
I had the same problem. Slurm has support for both cgroup/v1 and v2, but support for v2 is only compiled in if the dbus development files are present. So first install dbus-devel
dnf install dbus-devel
and then run a clean Slurm build.
Upvotes: 6
Reputation: 434
You may have to manually create cgroup.conf
in your slurm config directory https://stackoverflow.com/a/65226055/5749775
I fixed this by creating a fairly simple conf:
# /etc/slurm-llnl/cgroup.conf
CgroupAutomount=yes
# CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainDevices=yes
# TaskAffinity=yes
ConstrainRAMSpace=yes
# ConstrainSwapSpace=yes
MaxRAMPercent=98
AllowedSwapSpace=0
AllowedRAMSpace=100
MemorySwappiness=0
Upvotes: 2