toby
toby

Reputation: 11

SLURM batch job - how to run a preparation task once per node on each node that will receive jobs from the same batch file?

I am unable to find any relevant info regarding running preparation tasks on nodes, I expect this to be a common enough problem that means shouldn't be trying to create some custom workaround to implement this.

What i'm looking for (and can't seem to find), is some kind of "prepare" script option for slurm, that would be run once on each node before launching a set of jobs, only on the nodes that the jobs will be allocated to run on.

Does this sort of "prepare" feature exist in slurm?

Here is the scenario I am dealing with: I have a few slurm nodes attached to a Jenkins instance, the Jenkins instance has access to each slurm node, and the slurm jobs that we want to run require some specific files that are generated in the Jenkins flow to be present on each node.

Because each Jenkins job is unique, each slurm node must be prepared during the jenkins job before slurm jobs are dispatched to the slurm nodes. Currently, we use Jenkins to stash the required files, to connect to each jenkins node, unstash the required files (once), and then launch the array of jobs via sbatch. The problem here is that we are 1) using jenkins to prepare each slurm node for the jobs, which feels wrong, and 2) because we don't know which of the nodes will receive the current jobs, we are wasting resources by preparing all the nodes, then cleaning up all the nodes later, even if not all the nodes will be running the current job.

To better illustrate the point, lets we have two nodes (node1 and node2) with 10 cores each, and i have a 10 job array that uses 1 core per job. I give the array for sbatch to launch, and sbatch decides that it can fit all the jobs on node1. Therefore, it will run the "preparation" script on on node1 only (and not on node2), and it will only run this preparation job once, before launching any of the actual jobs on said node. If, next time i want to dispatch jobs to slurm, i have an array of 15 of said jobs, then the preparation script should be run once.

As an alternative to preparing each nodes, we have a tried using a NAS attached to all those nodes to store all relevant files, but its quite slow to prepare everything on the nas due to lots of small files, and runnign our jobs when some files are on the nas also makes them slow down. Obviously i can come up with other ways to partially work around this preparation problem, like ssh, but then i still don't know in advance which nodes the jobs will be launched on. And it seems this is such an obvious requirement that there should be an option to do this via slurm nativelly.

Upvotes: 1

Views: 183

Answers (1)

X Zhang
X Zhang

Reputation: 1325

  • If you got something like 5-10 nodes to care for, I guess things can be easily done hands-on using the srum/sbatch -w NODENAME approach.

  • Below is an ugly but practical way of get things done on a larger scale:

Step 1, over-commit the prepare job on a one-job-per-node rule. Say you got 100 nodes, you could submit 200 tasks, thus cover a large proportion of candidate nodes. Since only one job is running on a node each time, there will be no racing conditions. And repeated submission on prepared nodes will quit in a blink, so it will not hog resources.

#!/bin/bash

# Capture the SLURMD_NODENAME environment variable
ID=$(printenv | grep -o -Pe '(?<=SLURMD_NODENAME=).*')

# Check if ID was successfully captured
if [ -z "$ID" ]; then
    echo "SLURM_NODEID not found in environment variables."
    exit 1
fi

# Define the file name
FILE="${ID}.done"

# Check if the file exists
if [ ! -f "$FILE" ]; then
    echo "File $FILE does not exist. Running PREPARE script."
    ./PREPARE
else
    echo "File $FILE already exists. Skipping PREPARE script."
fi

Step 2, run the following jobs with snakemake --retries and --keep-going, and quit immediately if the node is not prepared. Thus job will be finished only on the prepared nodes, and queued for a retry otherwise.

Upvotes: 0

Related Questions