Fabian_G
Fabian_G

Reputation: 431

Snakemake using a rule in a loop

I'm trying to use Snakemake rules within a loop so that the rule takes the output of the previous iteration as input. Is that possible and if yes how can I do that?

Here is my example

  1. Setup the test data
mkdir -p test
echo "SampleA" > test/SampleA.txt
echo "SampleB" > test/SampleB.txt
  1. Snakemake
SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)


#### LOOP ####
for i in list(range(1, 4)):
    # Setup prefix for input
    if i == 1:
        prefix = "test"
    else:
        prefix = "loop%s" % str(i-1)

    # Setup prefix for output
    opref =  "loop%s" % str(i)

    # Rule
    rule loop_rule:
        input:
            prefix+"/{sample}.txt"
        output:
            prefix+"/{sample}.txt"
            #expand("loop{i}/{sample}.txt", i = i, sample = wildcards.sample)
        params:
            add=prefix
        shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"

Trying to run the example yields the ERROR CreateRuleException in line 26 of /Users/fabiangrammes/Desktop/Projects/snake_loop/Snakefile: The name loop_rule is already used by another rule. If anyone spots an option to get that thing to work it would be great!

Thanks !

Upvotes: 10

Views: 6480

Answers (2)

merv
merv

Reputation: 76700

I think this is a nice opportunity to use recursive programming. Rather than explicitly including conditionals for every iteration, write a single rule that transitions from iteration (n-1) to n. So, something along these lines:

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        expand("loop3/{sample}.txt", sample=SAMPLES)

def recurse_sample(wcs):
    n = int(wcs.n)
    if n == 1:
        return "test/%s.txt" % wcs.sample
    elif n > 1:
        return "loop%d/%s.txt" % (n-1, wcs.sample)
    else:
        raise ValueError("loop numbers must be 1 or greater: received %s" % wcs.n)

rule loop_n:
    input: recurse_sample
    output: "loop{n}/{sample}.txt"
    wildcard_constraints:
        sample="[^/]+",
        n="[0-9]+"
    shell:
        """
        awk -v loop='loop{wildcards.n}' '{{print $0, loop}}' {input} > {output}
        """

As @RussHyde said, you need to be proactive about ensuring no infinite loops are triggered. To this end, we ensure all cases are covered in recurse_sample and use wildcard_constraints to make sure the matching is precise.

Upvotes: 9

Russ Hyde
Russ Hyde

Reputation: 2269

My understanding is that your rules are converted to python code before they are ran and that all the raw python code present in your Snakefile is ran sequentially during this process. Think of it as your snakemake rules being evaluated as python functions.

But there's a constraint that any rule can only be evaluated to a function once.

You can have if/else expressions and differentially evaluate a rule (once) based on config values etc, but you can't evaluate a rule multiple times.

I'm not really sure how to rewrite your Snakefile to achieve what you want. Is there a real example that you could give where looping constructs appear to be required?

--- Edit

For fixed number of iterations, it may be possible to use an input-function to run the rule several times. (I would caution against doing this though, be extremely careful to disallow infinite loops)

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)

def looper_input(wildcards):
    # could be written more cleanly with a dictionary
    if (wildcards["prefix"] == "loop0"):
        input = "test/{}.txt".format(wildcards["sample"])
    else if (wildcards["prefix"] == "loop1"):
        input = "loop0/{}.txt".format(wildcards["sample"])
    ...
    return input


rule looper:
    input:
            looper_input
    output:
            "{prefix}/{sample}.txt"
    params:
            # ? should this be add="{prefix}" ?
            add=prefix
    shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"

Upvotes: 6

Related Questions