luomengt
luomengt

Reputation: 41

Force a certain rule to execute at the end

My question is very similar to this one.

I am writing a snakemake pipeline, and it does a lot pre- and post-alignment quality control. At the end of the pipeline, I run multiQC on those QC results.

Basically, the workflow is: preprocessing -> fastqc -> alignment -> post-alignment QCs such as picard, qualimap, and preseq -> peak calling -> motif analysis -> multiQC.

MultiQC should generate a report on all those outputs as long as multiQC support them.

One way to force multiqc to run at the very end is to include all the output files from the above rules in the input directive of multiqc rule, as below:

rule a:
  input: "a.input"
  output: "a.output"
  
rule b:
  input: "b.input"
  output: "b.output"
  
rule c:
  input: "b.output"
  output: "c.output"
  
rule multiqc:
  input: "a.output", "c.output"
  output: "multiqc.output"

However, I want a more flexible way that doesn't depend on specific upstream output files. In such a way, when I change the pipelines (adding or removing any rules), I don't need to change the dependency for multiqc rule. The input to multiqc should simply be a directory containing all the files that I want multiqc to scan over.

In my situation, how can I force the multiQC rule to execute at the very end of pipeline? Or is there any general way that I can force a certain rule in snakemake to run as the last job? Probably through some configuration on smakemake such that in any situation, no matter how I change the pipeline, this rule will execute at the end. I am not sure whether or not such method exists.

Thanks very much for helping!

Upvotes: 3

Views: 686

Answers (2)

luomengt
luomengt

Reputation: 41

It seems like onsuccess handler in snakemake is what I am looking for.

Upvotes: 1

Cornelius Roemer
Cornelius Roemer

Reputation: 8108

From your comments I gather that what you really want to do is run a flexibly configured number of QC methods and then summarise them in the end. The summary should only run, once all the QC methods you want to run have completed.

Rather than forcing the MultiQC rule to be executed in the end, manually, you can set up the MultiQC rule in such a way that it automatically gets executed in the end - by requiring the QC method's output as input.

Your goal of flexibly configuring which QC rules to run can be easily achieved by passing the names of the QC rules through a config file, or even easier as a command line argument.

Here is a minimal working example for you to extend:

###Snakefile###

rule end:
    input: 'start.out', 
           expand('opt_{qc}.out',qc=config['qc'])

rule start:
    output: 'start.out'

rule qc_a:
    input: 'start.out'
    output: 'opt_a.out'
    #shell: #whatever qc method a needs here

rule qc_b:
    input: 'start.out'
    output: 'opt_b.out'
    #shell: #whatever qc method b needs here

This is how you configure which QC method to run:

snakemake -npr end --config qc=['b']  #run just method b
snakemake -npr end --config qc=['a','b']  #run method a and b
snakemake -npr end --config qc=[]  #run no QC method

Upvotes: 1

Related Questions