Milad
Milad

Reputation: 5510

Does SLURM submission scripts run on the compute node or the controller machine?

I'm getting started with Slurm and I was assuming that the submission script I pass to sbatch runs on the controller and the steps that are marked with srun will run as a job step on a compute node. Consider example below:

#!/bin/bash
#SBATCH --cpus-per-task 12
#SBATCH --gres=gpu:1
#SBATCH --job-name=hello

hostname
srun hostname

I was expecting to see the hostname of the machine I'm submitting from first followed by the name of the compute node that's allocated for the job. Instead it seems the whole script is run on the compute node. I see the compute node's hostname in the log and then it fails because it can't find srun on that node:

/var/spool/slurm/d/job00201/slurm_script: line 5: srun: command not found

Am I missing something obvious?

Upvotes: 1

Views: 984

Answers (1)

Milad
Milad

Reputation: 5510

Turns out installing slurmd on the compute node is not enough. Installing slurm-client package pulls all those s binaries

Upvotes: 1

Related Questions