Ian
Ian

Reputation: 1166

wait for slurm job steps started in background by a separate program

In the following Slurm batch script, where program step_one and step_two are meant to run at the same time, the wait call is necessary so the job does not terminate before the job steps are done.

#!/bin/bash
#SBATCH --ntasks=2
srun --overlap -n1 step_one &
srun --overlap -n2 step_two &
wait

The wait blocks until all processes run in the background are done. If another program were to launch the processes for which I need to wait, how do I achieve the same result? Without going into details about DVC, just believe me that the following launches the same two steps "in the background" and exits before they are done.

#!/bin/bash
#SBATCH --ntasks=2
dvc repro
wait  # has no effect ... what would?

For those familiar with DVC, here is the pipeline file:

stages:
  one:
    cmd: srun --overlap -n1 step_one &
  two:
    cmd: srun --overlap -n1 step_two &

Here is the closes I can come, but I feel like I'm doing it wrong:

#!/bin/bash
#SBATCH --ntasks=2
dvc repro
while [ $(sstat -n -a -j $SLURM_JOB_ID | wc -l) -gt 1 ]
do
    sleep 10
done

Note that sstat gives me a job step called "$SLURM_JOB_ID.batch", hence -gt 1.


Update: Solutions to a similar problem (that does not involve Slurm) rely on knowing the PID of the non-child processes. To use those, I would at least need the PIDs.

Upvotes: 2

Views: 1872

Answers (2)

Ian
Ian

Reputation: 1166

Self-answer with my current solution.

My difficulty using DVC with Slurm jobs is that DVC runs stage commands serially (unless you get into queuing experiments, which introduces celery, which would be another queue on top of Slurm ... yikes.). If the stage commands run in the background, however, DVC will chug merrily along. But, you now have to manually enforce the DAG. I did this with advisory file system locking. You also don't want to run DVC commit until the backgrounded commands have completed.

Here's a pipeline with three stages (minimal working examples of <CMD> given below), note that the DAG allows stages one and three to run in parallel while two must run after one.

stages:
  one:
    cmd: flock lock/a <ONE> &
    outs:
    - one.txt
  two:
    cmd: flock lock/a <TWO> &
    deps:
    - one.txt
    outs:
    - two.txt
  three:
    cmd: flock lock/b <THREE> &
    outs:
    - three.txt

The lock/a and lock/b files are created by the flock command and correspond to the two separate branches of the DAG. Using flock may not be the ultimate solution; the release order of multiple stage commands waiting on the same lock is unclear to me.

Wrap your dvc repro command in a script something like this:

#!/bin/sh
set -e
mkdir lock
dvc repro --no-commit
for item in lock/*
do
    flock $item rm $item
done
rmdir lock

This script would be your sbatch submission script, but I'm leaving all that out. I'll also leave out the srun part of the minimal working example below, but you'd need them for Slurm in your stage commands.

When you source job.sh (or sbatch job.sh), the commands all fire into the background and DVC exits. The flock mechanism takes over for releasing commands to run, and the script exits after all locks are released (and cleaned up). You would then run dvc commit.

Here's an example that works without Slurm:

stages:
  one:
    cmd: flock lock/a ./stamp.sh </dev/null >one.txt &
    outs:
    - one.txt
  two:
    cmd: flock lock/a ./stamp.sh <one.txt >two.txt &
    deps:
    - one.txt
    outs:
    - two.txt
  three:
    cmd: flock lock/b ./stamp.sh </dev/null >three.txt &
    outs:
    - three.txt

With executable stamp.sh:

#!/bin/sh
echo "time now is $(date +'%T')"
read line
echo $line | sed -e "s/now is/then was/"
sleep 10

Some results:

% source job.sh
Running stage 'three':                                                
> flock lock/b ./stamp.sh </dev/null >three.txt &
WARNING: 'three.txt' is empty.                                        

Running stage 'one':
> flock lock/a ./stamp.sh </dev/null >one.txt &
WARNING: 'one.txt' is empty.                                          

Running stage 'two':
> flock lock/a ./stamp.sh <one.txt >two.txt &
WARNING: 'two.txt' is empty.                                          
Updating lock file 'dvc.lock'

To track the changes with git, run:

    git add dvc.lock

To enable auto staging, run:

    dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
% grep "time" *.txt
one.txt:time now is 11:38:58
three.txt:time now is 11:38:58
two.txt:time now is 11:39:08
two.txt:time then was 11:38:58

Upvotes: 1

Shcheklein
Shcheklein

Reputation: 6294

Just an idea. Unfortunately I don't know a better solution. In DVC it can be solved like this I think (just an idea from the top of my head). (It's not a complete solution! You would need to make the wait stage depend on the stages one and two similar to the dependency on zero so that wait doesn't start before them):

stages:
  zero:
    cmd:
      - rm -f res* || true
      - echo date > zero
    outs:
      - zero
    always_changed: true

  one:
    deps:
      - zero
    cmd: (./process1.sh; echo $? > res1) &

  two:
    deps:
      - zero
    cmd: (./process2.sh; echo $? > res2) &

  wait:
    deps:
      - zero
    cmd: ./wait.sh

where wait.sh:

#!/bin/bash

set -eux

while [ ! -f res1 ] || [ ! -f res2 ] ; do sleep 1; done

It becomes ugly pretty quick tbh :( Primarily since there is no mechanism for a stage to depend on another stage w/o an explicit out/dep between them.

If you can make stages output files in some other way in your case (e.g. create a file when they just start) that would simplify the logic a bit.

Upvotes: 0

Related Questions