Reputation: 543
I have a list of fasta may be used, but some of them may also not be used, so I hope to use snakemake to index fastq if need
I bulit a yaml file like this
# config.yaml
reference_genome:
fa1: "path/to/genome"
fa2: "..."
fa3: "..."
...
and I write a snakemake like this
configfile: "config.yaml"
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{reference_genome}.{type}', reference_genome={reference_genome}, type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"
I hope snakemake can monitor the index file (amb, ann, pac)
, but this script will raise follow error:
name 'reference_genome' is not defined
File "/public/...", line ..., in <module>
update: base on @dariober's answer: if we runing with following config.yaml
reference_genome:
fa1: "genome_1.fa"
fa2: "genome_2.fa"
fa3: "genome_3.fa"
I expect the output is
genome_1.fa.{amb, ann, pac}
genome_2.fa.{amb, ann, pac}
genome_3.fa.{amb, ann, pac}
If we use following workaround
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{{reference_genome}}.{type}', type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"
we will get
$ snakemake -s snakemake_test.smk --configfile config.yaml
# for reference_name is fa1
[Fri Dec 2 17:56:29 2022]
rule index:
input: genome_1.fa
output: fa1.amb, fa1.ann, fa1.pac
log: log/rule_index_fa1.log
jobid: 1
wildcards: reference_genome=fa1
...
Thats not my expected output
the output is fa1.amb, fa1.ann, fa1.pac, but I wanted output is genome_1.fa.amb, genome_1.fa.ann, genome_1.fa.pac
Upvotes: 2
Views: 798
Reputation: 1277
Building upon dariober's answer and judging from your comments, I think this is what you are looking for?
configfile: "config.yaml"
rule all:
input:
expand(
"{reference_genome}.{type}",
reference_genome=list(config["reference_genome"].values()),
type=["amb", "ann", "pac"],
),
rule index:
input:
#reference_genomeFile
ref_genome="{reference_genome}",
output:
expand("{{reference_genome}}.{type}", type=["amb", "ann", "pac"]),
log:
"log/rule_index_{reference_genome}.log",
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"
with config.yaml
reference_genome:
fa1: "path/to/genome1"
fa2: "path/to/genome2"
fa3: "path/to/genome3"
I've modified the rule all
to use the filepaths from config.yaml
rather than the list ['fa1','fa2','fa3']
. I've also removed the lambda wildcard
from the input of rule index
as it seems unnecessary.
Upvotes: 1
Reputation: 9062
My guess is that you want:
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{{reference_genome}}.{type}', type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"
snakemake -p -n -j 1 --configfile config.yaml
That is: Run rule index
for each genome file, i.e., three times here. Each run of index
generates the three index files. Note the use of double curly braces {{reference_genome}}
to tell expand
that this wildcard does not need to be expanded.
Example config.yaml:
reference_genome:
fa1: "genome_1.fa"
fa2: "genome_2.fa"
fa3: "genome_3.fa"
Upvotes: 1