Howard Sherman
Howard Sherman

Reputation: 99

Snakemake: expand params

I am trying to build a simple workflow to feed a list of parameters to a script. To illustrate:

SAMPLES=['A','B']

rule test:
    params:
        sample=expand("{sample}", sample=SAMPLES)
    script:
        "test.py {params.sample}"

However, snakemake only executes the script with sample A, not B. In other words, I believe it is executing python test.py A B, not python test.py A and then python test.py B. Similarly, I think this is illustrated by:

SAMPLES=['A','B']

rule print_samples:
    params:
        sample=expand("{sample}", sample=SAMPLES)
    script:
        "echo {params.sample} \n"

I would expect to see A and B printed out on separate lines, but instead it prints A B on the same line.

Am I missing something about the way expand works with params? Ideally I would like to add the -j flag to run them in parallel (at the moment -j simply executes with A alone).

Upvotes: 4

Views: 4267

Answers (1)

Troy Comi
Troy Comi

Reputation: 2059

That is the expected output. Expand in this case is just a wrapper for

[str(sample) for sample in SAMPLES]

which when input to the shell or script becomes the items joined with a space in between A B.

Instead, you want a general rule which will work for any sample (you also need an output file):

rule test:
   output: "{sample}.out"
   shell:
      "test.py {wildcards.sample}"  # no need for params, assume this writes output {sample}.out

Here test.py is an executable. So when you ask for A.out, test.py A runs, for B.out you get test.py B.

Next you have to ask for the outputs you want. This is usually the first rule in the snakefile, and called all:

rule all:
   input: expand('{sample}.out', sample=SAMPLES)

Again, expand will give you a list of samples, and in your case, rule all becomes:

rule all:
   input: 'A.out', 'B.out'

With the output files specified, snakemake determines the rule test needs to be run twice, once with A and once with B.

So remember, write your rules as a generalization for any one sample. You may only need one expand in the rule all to specialize your rules for every sample. Snakemake is responsible for figuring out what needs to be run and if you give it additional cores it can do so simultaneously for separate jobs.

Upvotes: 5

Related Questions