Reputation: 3765
We use Slurm workload manager to submit jobs to our high performance cluster. During runtime of a job, we need to copy the input files from a network filesystem to the node's local filesystem, run our analysis there and then copy the output files back to the project directory on the network filesystem.
While the workflow management system Snakemake integrates with Slurm (by defining profiles) and allows to run each rule/step in the workflow as Slurm job, I haven't found a simple way to specify for each rule, wether a tmp folder should be used (with all the implications stated above or not.
I am very happy for simple solutions how to realise this behaviour.
Upvotes: 4
Views: 555
Reputation: 3711
I am not entirely sure if I understand correctly. I am guessing you do not want to copy the input of each rule to a certain directory, do the rule, then copy the output back to another filesystem, since that would be a lot of unnecessary files moving around. So for the first half of the answer I assume before execution you move your files to /scratch/mydir
.
I believe you could use the --directory
command (https://snakemake.readthedocs.io/en/stable/executing/cli.html). However I find this works poorly, since then snakemake has difficulty finding the config.yaml
and samples.tsv
.
The way I solve this is just by adding a working dir in front of my paths in each rule...
rule example:
input:
config["cwd"] + "{sample}.txt"
output:
config["cwd"] + "processed/{sample}.txt"
shell:
"""
touch {output}
"""
So all you then have to do is change cwd in your config.yaml
.
local:
cwd: ./
slurm:
cwd: /scratch/mydir
You would then have to manually copy them back to your long-term filesystem or make a rule that would do that for you.
Now if however you do want to copy your files from filesystem A -> B, do your rule, and then move the result from B -> A, then I think you want to make use of shadow rules. I think the docs properly explain how to use that so I just give a link :).
Upvotes: 2