dariober
dariober

Reputation: 9062

snakemake: Ambiguous rule not detected?

The following Snakefile fails with AmbiguousRuleException:

library_id = ['S1']
run_id = ['R1']

samples = dict(zip(library_id, run_id))

rule all:
    input:
        expand('{library_id}.bam', library_id= library_id),

rule bwa:
    output:
        '{run_id}.bam',

rule merge_bam:
    input:
        lambda wc: '%s.bam' % samples[wc.library_id],
    output:
        '{library_id}.bam',

Gives:


    AmbiguousRuleException:
    Rules bwa and merge_bam are ambiguous for the file S1.bam.
    Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
    Wildcards:
        bwa: run_id=S1
        merge_bam: library_id=S1
    Expected input files:
        bwa: 
        merge_bam: R1.bamExpected output files:
        bwa: S1.bam
        merge_bam: S1.bam

That's expected and it's ok. However, if library_id and run_id have the same value the ambiguity is not detected and only the first rule is executed:

library_id = ['S1']
run_id = ['S1'] # Same as library_id!

samples = dict(zip(library_id, run_id))

rule all:
    input:
        expand('{library_id}.bam', library_id= library_id),

rule bwa:
    output:
        '{run_id}.bam',

rule merge_bam:
    input:
        lambda wc: '%s.bam' % samples[wc.library_id],
    output:
        '{library_id}.bam',

Dry-run execution:

Job counts:
    count   jobs
    1   all
    1   bwa
    2

[Mon Aug 23 11:27:39 2021]
localrule bwa:
    output: S1.bam
    jobid: 1
    wildcards: run_id=S1

[Mon Aug 23 11:27:39 2021]
localrule all:
    input: S1.bam
    jobid: 0

Job counts:
    count   jobs
    1   all
    1   bwa
    2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Is this a bug or am I missing something? The second example should give AmbiguousRuleException just like the first and it's even more obvious.

This is with snakemake 6.4.1

Upvotes: 1

Views: 167

Answers (1)

dofree
dofree

Reputation: 421

TL;DR

Snakemake performs some checks for cycles and jobs with the same input and output file(s) are removed from consideration during DAG creation. In your working case, the job from the merge_bam rule has the same input/output file (S1.bam) so it is not considered in the DAG and their is no ambiguity when satisfying the input of the all rule.

Details

Snakemake starts with the final target file (in this case S1.bam) and works backward to find parameterized rules (jobs) that can be executed to create the target file from existing input files. To do this, it recursively calls snakemake/dag.py::DAG.update() and snakemake/dag.py::DAG.update_() to construct the DAG from the initial target file(s). DAG.update() has the following check to remove jobs from consideration if they produce the same output file that they require for input:

if file in job.input:
    cycles.append(job)
    continue

E.g. if the target file is also the candidate job's input file, skip this candidate job.

In your working case, the job from the merge_bam rule is considered as a candidate for producing the S1.bam file requested by the all rule. However, the merge_bam job also requests S1.bam for it's own input, so it fails the above check for cycles. Consequently, it is not considered a producer for the S1.bam file requested by the all rule, leaving only the bwa job.

In the exception case, the merge_bam rule outputs S1.bam but asks for R1.bam as input, so it passes the cycle check and is considered a potential producer of the S1.bam file requested by the all rule. Since both merge_bam and bwa can produce S1.bam (and there is no ruleorder defined) an AmbiguousRuleException is thrown.

Conclusions

The mixing of a cyclic DAG and ambiguous rules causes this unintuitive behavior. Snakemake doesn't aim to find all possible rule ambiguities, so I would not necessarily say that this is a bug.

Upvotes: 1

Related Questions