Reputation: 65
Generating output based on changed input files in Snakemake is easy:
rule all:
input: [f'out_{i}.txt' for i in range(10)]
rule make_input:
output: 'in_{i}.txt'
shell: 'touch {output}'
rule make_output_parallel:
input: 'in_{i}.txt'
output: 'out_{i}.txt'
shell: 'touch {output}'
In this case, make_output
will only run for instances where in_{i}.txt
have changed.
But suppose the 'out_{i}.txt'
cannot be generated in parallel and I want to generate them in a single step, like,
rule make_output_one_step:
input: [f'in_{i}.txt' for i in range(10)]
output: [f'out_{i}.txt' for i in range(10)]
shell: 'touch {output}'
If only one of the in_{i}.txt
files have changed, I don't need to regenerate all 10 of them.
How can I adjust make_output_one_step.output
to generate only the needed files?
Upvotes: 0
Views: 170
Reputation: 3701
If you want some parts of the pipeline to not work in parallel for whatever reason (RAM, internet usage, IO, API limit, etc....) you can make use of resources.
rule all:
input: [f'out_{i}.txt' for i in range(10)]
rule make_input:
output: 'in_{i}.txt'
shell: 'touch {output}'
rule make_output:
input: 'in_{i}.txt'
output: 'out_{i}.txt'
resources: max_parallel=1
shell: 'touch {output}'
And then you can call your pipeline like snakemake --resources max_parallel=1 --cores 10
. In this case all the jobs of rule make_input
will run in parallel, but only one instance of make_output
will run in parallel.
Upvotes: 1