goi42
goi42

Reputation: 65

snakemake batch creation of output

Generating output based on changed input files in Snakemake is easy:

rule all:
    input: [f'out_{i}.txt' for i in range(10)]
rule make_input:
    output: 'in_{i}.txt'
    shell: 'touch {output}'
rule make_output_parallel:
    input: 'in_{i}.txt'
    output: 'out_{i}.txt'
    shell: 'touch {output}'

In this case, make_output will only run for instances where in_{i}.txt have changed.

But suppose the 'out_{i}.txt' cannot be generated in parallel and I want to generate them in a single step, like,

rule make_output_one_step:
    input: [f'in_{i}.txt' for i in range(10)]
    output: [f'out_{i}.txt' for i in range(10)]
    shell: 'touch {output}'

If only one of the in_{i}.txt files have changed, I don't need to regenerate all 10 of them. How can I adjust make_output_one_step.output to generate only the needed files?

Upvotes: 0

Views: 170

Answers (1)

Maarten-vd-Sande
Maarten-vd-Sande

Reputation: 3701

If you want some parts of the pipeline to not work in parallel for whatever reason (RAM, internet usage, IO, API limit, etc....) you can make use of resources.

rule all:
    input: [f'out_{i}.txt' for i in range(10)]

rule make_input:
    output: 'in_{i}.txt'
    shell: 'touch {output}'

rule make_output:
    input: 'in_{i}.txt'
    output: 'out_{i}.txt'
    resources: max_parallel=1
    shell: 'touch {output}'

And then you can call your pipeline like snakemake --resources max_parallel=1 --cores 10. In this case all the jobs of rule make_input will run in parallel, but only one instance of make_output will run in parallel.

Upvotes: 1

Related Questions