Noah_Seagull
Noah_Seagull

Reputation: 377

Is it possible to add a conditional statement in snakemake's rule all?

I want to run multiple snakefiles called qc.smk , dada2.smk, picrust2.smkusing singularity. Then there is one snakefile called longitudinal.smk I would like to run conditionally. For example, if longitudinal data is being used.

# set vars

LONGITUDINAL = config['perform_longitudinal']

rule all:
  input:
    # fastqc output before trimming
    raw_html = expand("{scratch}/fastqc/{sample}_{num}_fastqc.html", scratch = SCRATCH, sample=SAMPLE_SET, num=SET_NUMS),
    raw_zip = expand("{scratch}/fastqc/{sample}_{num}_fastqc.zip", scratch = SCRATCH, sample=SAMPLE_SET, num=SET_NUMS),
    raw_multi_html = SCRATCH + "/fastqc/raw_multiqc.html",
    raw_multi_stats = SCRATCH + "/fastqc/raw_multiqc_general_stats.txt"

# there are many more files in rule all

##### setup singularity #####

singularity: "docker://continuumio/miniconda3"

##### load rules #####

include: "rules/qc.smk"
include: "rules/dada2.smk"
include: "rules/phylogeny.smk"
include: "rules/picrust2.smk"

if LONGITUDINAL == 'yes':
    include: 'rules/longitudinal.smk'
    print("Will perform a longitudinal analysis")
else:
    print("no longitudinal analysis")

The code above works only if I am running a longitudinal dataset. However, when I am not running the longitudinal analysis snakemake fails and says something like:

MissingInputException in line 70 of /mnt/c/Users/noahs/projects/tagseq-qiime2-snakemake-1/Snakefile:
Missing input files for rule all:

I think if I was able to add a similar conditional statement like the one I have for my external snakefile snakemake would not freak out about me not including the longitudinal snakefile.

Upvotes: 2

Views: 5432

Answers (2)

Noah_Seagull
Noah_Seagull

Reputation: 377

Solution for merging list form expand statement:

I used a configuration file to pass the statements to the Snakefile

## Config.yml ##
# longitudinal analysis
perform_longitudinal: 'yes' # yes for longitudinal analysis 

When 'yes' is entered in the configuration Snakemake will include additional variables in rule all and run an addition Snakefile to generate these files. There ended up being multiple Snakefiles so I used singularity to let Snakemake know that the rule all input files were for all 6 Snakefiles.

## Snakefile ##

configfile: "config.yaml"

LONGITUDINAL = config['perform_longitudinal']

 # rule all input files
 raw_html=file.txt, 
 raw_zip=file.txt,
 raw_multi_htmt=file.txt,
 raw_multi_stats=file.txt,
 Longitudinal_analaysis_files=file.txt

# rule all files excluding longitudinal analysis
rule_all_input_list=['raw_html','raw_zip','raw_multi_htmt','raw_multi_stats']

#longitudinal analysis files
rule_all_longitudinal_input=['Longitudinal_analaysis_files']

if LONGITUDINAL == 'yes':

    rule_all_input_list.extend(rule_all_longitudinal_input)

# conditionally add Snakefile to workflow
    include: 'rules/longitudinal.smk'

    print("Will perform a longitudinal analysis")

else:
    print("no longitudinal analysis")


rule all:
    input:
        data = rule_all_input_list

##### setup singularity #####

# this container defines the underlying OS for each job when using the workflow
# with --use-conda --use-singularity
singularity: "docker://continuumio/miniconda3"

##### load rules #####

include: "rules/qc.smk"
include: "rules/dada2.smk"
include: "rules/phylogeny.smk"
include: "rules/picrust2.smk"
include: "rules/differential.smk"

I have a less simplified version of how I got this working on GitHub https://github.com/nasiegel88/tagseq-qiime2-snakemake-1

Upvotes: 1

Maarten-vd-Sande
Maarten-vd-Sande

Reputation: 3701

You can define a list (or dict) of what you want as output outside of the rule all, and feed that to the input, something like this works:

myoutput = list()

if condition_1 == True:
    myoutput.append("file_1.txt")
if condition_2 == True:
    myoutput.append("file_2.txt")

rule all:
    input:
        myoutput

edit:

Either place myoutput as first in the input of rule all:

rule all:
    input:
        myoutput,
        raw_html = "raw_html_path",
        raw_zip = "raw_zip_path"

or make it named, and place it wherever:

rule all:
    input:
        raw_html = "raw_html_path",
        myoutput = myoutput,
        raw_zip = "raw_zip_path"

In Python (and snakemake) named positional arguments always go before named arguments.

Upvotes: 4

Related Questions