zhang
zhang

Reputation: 543

Use config and wildcards in snakemake rule output

I have a list of fasta may be used, but some of them may also not be used, so I hope to use snakemake to index fastq if need

I bulit a yaml file like this

# config.yaml
reference_genome:
 fa1: "path/to/genome"
 fa2: "..."
 fa3: "..."
...

and I write a snakemake like this

configfile: "config.yaml"
rule all:
    input:
        expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index: 
    input: 
        #reference_genomeFile
        ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
    output: 
        expand('{reference_genome}.{type}', reference_genome={reference_genome}, type=['amb', 'ann', 'pac'])
    log: 
        'log/rule_index_{reference_genome}.log'
    shell: 
        "bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"

I hope snakemake can monitor the index file (amb, ann, pac), but this script will raise follow error:

name 'reference_genome' is not defined
  File "/public/...", line ..., in <module>

update: base on @dariober's answer: if we runing with following config.yaml

reference_genome:
 fa1: "genome_1.fa"
 fa2: "genome_2.fa"
 fa3: "genome_3.fa"

I expect the output is

genome_1.fa.{amb, ann, pac}
genome_2.fa.{amb, ann, pac}
genome_3.fa.{amb, ann, pac}

If we use following workaround

rule all:
    input:
        expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])

rule index: 
    input: 
        #reference_genomeFile
        ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
    output: 
        expand('{{reference_genome}}.{type}', type=['amb', 'ann', 'pac'])
    log: 
        'log/rule_index_{reference_genome}.log'
    shell: 
        "bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"

we will get

$ snakemake -s snakemake_test.smk --configfile config.yaml
# for reference_name is fa1
[Fri Dec  2 17:56:29 2022]
rule index:
    input: genome_1.fa
    output: fa1.amb, fa1.ann, fa1.pac
    log: log/rule_index_fa1.log
    jobid: 1
    wildcards: reference_genome=fa1
...

Thats not my expected output

the output is fa1.amb, fa1.ann, fa1.pac, but I wanted output is genome_1.fa.amb, genome_1.fa.ann, genome_1.fa.pac

Upvotes: 2

Views: 798

Answers (2)

euronion
euronion

Reputation: 1277

Building upon dariober's answer and judging from your comments, I think this is what you are looking for?

configfile: "config.yaml"


rule all:
    input:
        expand(
            "{reference_genome}.{type}",
            reference_genome=list(config["reference_genome"].values()),
            type=["amb", "ann", "pac"],
        ),


rule index:
    input:
        #reference_genomeFile
        ref_genome="{reference_genome}",
    output:
        expand("{{reference_genome}}.{type}", type=["amb", "ann", "pac"]),
    log:
        "log/rule_index_{reference_genome}.log",
    shell:
        "bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"

with config.yaml

reference_genome:
 fa1: "path/to/genome1"
 fa2: "path/to/genome2"
 fa3: "path/to/genome3"

I've modified the rule all to use the filepaths from config.yaml rather than the list ['fa1','fa2','fa3']. I've also removed the lambda wildcard from the input of rule index as it seems unnecessary.

Upvotes: 1

dariober
dariober

Reputation: 9062

My guess is that you want:

rule all:
    input:
        expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])

rule index: 
    input: 
        #reference_genomeFile
        ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
    output: 
        expand('{{reference_genome}}.{type}', type=['amb', 'ann', 'pac'])
    log: 
        'log/rule_index_{reference_genome}.log'
    shell: 
        "bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"
snakemake -p -n -j 1 --configfile config.yaml

That is: Run rule index for each genome file, i.e., three times here. Each run of index generates the three index files. Note the use of double curly braces {{reference_genome}} to tell expand that this wildcard does not need to be expanded.


Example config.yaml:

reference_genome:
 fa1: "genome_1.fa"
 fa2: "genome_2.fa"
 fa3: "genome_3.fa"

Upvotes: 1

Related Questions