Reputation: 9062
The following Snakefile fails with AmbiguousRuleException
:
library_id = ['S1']
run_id = ['R1']
samples = dict(zip(library_id, run_id))
rule all:
input:
expand('{library_id}.bam', library_id= library_id),
rule bwa:
output:
'{run_id}.bam',
rule merge_bam:
input:
lambda wc: '%s.bam' % samples[wc.library_id],
output:
'{library_id}.bam',
Gives:
AmbiguousRuleException:
Rules bwa and merge_bam are ambiguous for the file S1.bam.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
bwa: run_id=S1
merge_bam: library_id=S1
Expected input files:
bwa:
merge_bam: R1.bamExpected output files:
bwa: S1.bam
merge_bam: S1.bam
That's expected and it's ok. However, if library_id
and run_id
have the same value the ambiguity is not detected and only the first rule is executed:
library_id = ['S1']
run_id = ['S1'] # Same as library_id!
samples = dict(zip(library_id, run_id))
rule all:
input:
expand('{library_id}.bam', library_id= library_id),
rule bwa:
output:
'{run_id}.bam',
rule merge_bam:
input:
lambda wc: '%s.bam' % samples[wc.library_id],
output:
'{library_id}.bam',
Dry-run execution:
Job counts:
count jobs
1 all
1 bwa
2
[Mon Aug 23 11:27:39 2021]
localrule bwa:
output: S1.bam
jobid: 1
wildcards: run_id=S1
[Mon Aug 23 11:27:39 2021]
localrule all:
input: S1.bam
jobid: 0
Job counts:
count jobs
1 all
1 bwa
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Is this a bug or am I missing something? The second example should give AmbiguousRuleException
just like the first and it's even more obvious.
This is with snakemake 6.4.1
Upvotes: 1
Views: 167
Reputation: 421
Snakemake performs some checks for cycles and jobs with the same input and output file(s) are removed from consideration during DAG creation. In your working case, the job from the merge_bam
rule has the same input/output file (S1.bam
) so it is not considered in the DAG and their is no ambiguity when satisfying the input of the all
rule.
Snakemake starts with the final target file (in this case S1.bam
) and works backward to find parameterized rules (jobs) that can be executed to create the target file from existing input files. To do this, it recursively calls snakemake/dag.py::DAG.update()
and snakemake/dag.py::DAG.update_()
to construct the DAG from the initial target file(s). DAG.update()
has the following check to remove jobs from consideration if they produce the same output file that they require for input:
if file in job.input:
cycles.append(job)
continue
E.g. if the target file is also the candidate job's input file, skip this candidate job.
In your working case, the job from the merge_bam
rule is considered as a candidate for producing the S1.bam
file requested by the all
rule. However, the merge_bam
job also requests S1.bam
for it's own input, so it fails the above check for cycles. Consequently, it is not considered a producer for the S1.bam
file requested by the all
rule, leaving only the bwa
job.
In the exception case, the merge_bam
rule outputs S1.bam
but asks for R1.bam
as input, so it passes the cycle check and is considered a potential producer of the S1.bam
file requested by the all
rule. Since both merge_bam
and bwa
can produce S1.bam
(and there is no ruleorder defined) an AmbiguousRuleException
is thrown.
The mixing of a cyclic DAG and ambiguous rules causes this unintuitive behavior. Snakemake doesn't aim to find all possible rule ambiguities, so I would not necessarily say that this is a bug.
Upvotes: 1