Reputation: 5510
I'm getting started with Slurm and I was assuming that the submission script I pass to sbatch
runs on the controller and the steps that are marked with srun
will run as a job step on a compute node. Consider example below:
#!/bin/bash
#SBATCH --cpus-per-task 12
#SBATCH --gres=gpu:1
#SBATCH --job-name=hello
hostname
srun hostname
I was expecting to see the hostname of the machine I'm submitting from first followed by the name of the compute node that's allocated for the job. Instead it seems the whole script is run on the compute node. I see the compute node's hostname in the log and then it fails because it can't find srun
on that node:
/var/spool/slurm/d/job00201/slurm_script: line 5: srun: command not found
Am I missing something obvious?
Upvotes: 1
Views: 984
Reputation: 5510
Turns out installing slurmd
on the compute node is not enough. Installing slurm-client
package pulls all those s
binaries
Upvotes: 1