Reputation: 65
I often find when adding rules to my workflow that I need to split large jobs up into batches. This means that my input/output files will branch out across temporary sets of batches for some rules before consolidating again into one input file for a later rule. For example:
rule all:
input:
expand("final_output/{sample}.counts",sample=config["samples"]) ##this final output relates to blast rule in that it will feature a column defining transcript type
...
rule batch_prep:
input: "transcriptome.fasta"
output:expand("blast_input_{X}.fasta",X=[1,2,3,4,5])
script:"scripts/split_transcriptome.sh"
rule blast:
input:"blast_input_{X}.fasta",
output:"output_blast.txt"
script:"scripts/blastx.sh"
...
rule rsem:
input:
"transcriptome.fasta",
"{sample}.fastq"
output:
"final_output/{sample}.counts"
script:
"scripts/rsem.sh"
In this simplified workflow, snakemake -n
would show a separate rsem
job for each sample (as expected, from wildcards set in rule all
). However, blast
would give a WildcardError
stating that
Wildcards in input files cannot be determined from output files:
'X'
This makes sense, but I can't figure out a way for the Snakefile
to submit separate jobs for each of the 5 batches above using the one blast
template rule. I can't make separate rules for each batch, as the number of batches will vary on the size of the dataset. It seems it would be useful if I could define wildcards local to a rule. Does such a thing exist, or is there a better way to solve this issue?
Upvotes: 2
Views: 261
Reputation: 326
I hope I understood your problem correctly, if not, feel free to correct me:
So, you want to call the rule blast
for every "blast_input_{X}.fasta"
?
Then, the batch wildcard would need to be carried over into the output.
rule blast:
input:"blast_input_{X}.fasta",
output:"output_blast_{X}.txt"
script:"scripts/blastx.sh"
If you then later want to merge the batches again in another rule, just use expand
in the input of that rule.
input: expand("output_blast_{X}.txt", X=your_batches)
output: "merged_blast_output.txt"
Upvotes: 3