Reputation: 11270
I have a simulation that consists of N steps, run sequentially. Each of these steps modifies a global state in memory, until the final step which is the result. It is possible, after a step has run, to write to disk the intermediate state that this step just computed, and to load such an intermediate state instead of starting from scratch. Writing and loading intermediate states has a non-negligible cost.
I want to run many variations of a simulation on a Slurm cluster. Each variation will change the parameter of some of the steps.
S1 --> S2 --> S3 --> S4
run1: S2.speed=2, S3.height=12
run2: S2.speed=2, S3.height=20
run3: S2.speed=2, S3.height=40
run4: S2.speed=5, S3.height=12
run5: S2.speed=5, S3.height=80
What I want to do is for the various runs to share common computations, by dumping the intermediate state of the shared steps. This will form a tree of step runs:
S1
├─ S2 (speed=2)
│ ├─ S3 (height=12)
│ │ └─ S4
│ ├─ S3 (height=20)
│ │ └─ S4
│ └─ S3 (height=40)
│ └─ S4
└─ S2 (speed=5)
├─ S3 (height=12)
│ └─ S4
└─ S3 (height=80)
└─ S4
I know I can get the result of the 5 runs by running 5 processes:
run1: S1 --> S2 (speed=2) --> S3 (height=12) --> S4
run2: (dump of run1.S2) --> S3 (height=20) --> S4
run3: (dump of run1.S2) --> S3 (height=40) --> S4
run4: (dump of run1.S1) --> S2 (speed=5) --> S3 (height=12) --> S4
run5: (dump of run4.S2) --> S3 (height=80) --> S4
This reduces the computation from 20 steps using a naive approach, to 13 steps with 3 dumps and 4 loads.
Now, my question is how to model this with Slurm, to make the best use of the scheduler?
One solution I can think of, is that each run is responsible to submit the jobs of the runs that depend on it, after the dump of the intermediate state. Run1 will submit run4 after dumping S1, and then it will submit run2 and run3 after dumping S2, and run4 will submit run5 after dumping S2. With this solution, is there any point in declaring the dependency when submitting the job to Slurm?
Another solution I can see is to break the long chains of computation in multiple, dependent jobs. The list of jobs to submit and their dependencies would be basically the tree I drew above (except the pairs S3/S4 would be merged in the same job). This is 8 jobs to submit instead of 5, but I can submit them all at once from the beginning, with the right dependencies. However, I am not sure what the advantages of this approach would be. Will Slurm do a better job as a scheduler, if he knows the full list of jobs and their dependencies right from the start? Are there some advantages from a user point of view, to have all the jobs submitted and linked with dependencies (eg, to cancel all the jobs that depend on the root job)? I know I can submit many jobs at once with a job array, but I don't see a way to declare dependencies between jobs of the same array. Is is possible, or even advisable?
Finally, are there other approaches I did not think about?
The example I gave is of course simplified a lot. The real simulations will contain hundreds of steps, with about a thousand variations to try. The scalability of the chosen solution is important.
Upvotes: 1
Views: 262
Reputation: 3711
Another possible solution is to make use of pipeline tools. In the field of bioinformatics SnakeMake is becoming really popular. SnakeMake is based on GNU Make, but made in Python, hence the name SnakeMake. For SnakeMake to work you specify which output you want, and SnakeMake will deduce which rules
it has to run for this output. One of the nice things about SnakeMake is that it scales really easily from personal laptops, to bigger computers, and even clusters (for instance slurm clusters). Your example would look something like this:
rule all:
input:
['S4_speed_2_height_12.out',
'S4_speed_2_height_20.out',
'S4_speed_2_height_40.out',
'S4_speed_5_height_12.out',
'S4_speed_5_height_80.out']
rule S1:
output:
"S1.out"
shell:
"touch {output}" # do your heavy computations here
rule S2:
input:
"S1.out"
output:
"S2_speed_{speed}.out"
shell:
"touch {output}"
rule S3:
input:
"S2_speed_{speed}.out"
output:
"S3_speed_{speed}_height_{height}.out"
shell:
"touch {output}"
rule S4:
input:
"S3_speed_{speed}_height_{height}.out"
output:
"S4_speed_{speed}_height_{height}.out"
shell:
"touch {output}"
We can then ask snakemake to make a pretty image of how it would perform these computations:
Snakemake automatically figures out which output can be used by different rules.
Running this on your local machine is as simple executing snakemake
, and to submit the actions to slurm is just snakemake --cluster "sbatch"
. The example I gave is obviously an oversimplification, but SnakeMake is highly customizable (nr of threads per rule, memory usage, etc.), and has the advantage that it is based on Python. It takes a bit of figuring out how everything works in SnakeMake, but I can definitely recommend it.
Upvotes: 2
Reputation: 59260
One solution I can think of, is that each run is responsible to submit the jobs of the runs that depend on it, after the dump of the intermediate state. With this solution, is there any point in declaring the dependency when submitting the job to Slurm?
This is an approach often followed with simple workflows that involve long-running jobs that must checkpoint and restart.
Another solution I can see is to break the long chains of computation in multiple, dependent jobs. Will Slurm do a better job as a scheduler, if he knows the full list of jobs and their dependencies right from the start?
No. Slurm will just ignore the jobs that are not eligible to start because their dependent jobs are not finished.
Are there some advantages from a user point of view, to have all the jobs submitted and linked with dependencies (eg, to cancel all the jobs that depend on the root job)?
Yes, but that is marginally useful.
I know I can submit many jobs at once with a job array, but I don't see a way to declare dependencies between jobs of the same array. Is is possible, or even advisable?
No you cannot set dependencies between jobs of the same array.
Finally, are there other approaches I did not think about?
You could use a workflow management system.
One of the simplest solution is Makeflow. It uses files that look like classical Makefiles
that describe the dependencies between jobs. Then, simply running something like makeflow –T slurm makefile.mf
Another option is Bosco. Bosco offers a bit more possibilities, and is good for personal use. It is easy to setup and can submit jobs to multiple clusters.
Finally, Fireworks is a very powerful solution. It requires a MongoDB, and is more suited for lab-wise use, but it can implement very complex logic for job submission/resubmission based on the outputs of jobs, and can handle errors in a clever way. You can for instance implement a workflow where a job is submitted with a given value for a given parameter, and have Fireworks monitor the convergence based on the output file, and cancel and re-submit with another value in case the convergence is not satisfactory.
Upvotes: 2