Leo T. Osborne Jr
Leo T. Osborne Jr

Reputation: 65

Rule-specific wildcards in Snakemake

I often find when adding rules to my workflow that I need to split large jobs up into batches. This means that my input/output files will branch out across temporary sets of batches for some rules before consolidating again into one input file for a later rule. For example:

rule all:
   input:
       expand("final_output/{sample}.counts",sample=config["samples"]) ##this final output relates to blast rule in that it will feature a column defining transcript type

...


rule batch_prep:
    input: "transcriptome.fasta"
    output:expand("blast_input_{X}.fasta",X=[1,2,3,4,5])
    script:"scripts/split_transcriptome.sh"

rule blast:
    input:"blast_input_{X}.fasta",
    output:"output_blast.txt"
    script:"scripts/blastx.sh"


...




rule rsem:
    input:
        "transcriptome.fasta",
        "{sample}.fastq"
    output:
        "final_output/{sample}.counts"
    script:
        "scripts/rsem.sh"

In this simplified workflow, snakemake -n would show a separate rsem job for each sample (as expected, from wildcards set in rule all). However, blast would give a WildcardError stating that

Wildcards in input files cannot be determined from output files:
'X'

This makes sense, but I can't figure out a way for the Snakefile to submit separate jobs for each of the 5 batches above using the one blast template rule. I can't make separate rules for each batch, as the number of batches will vary on the size of the dataset. It seems it would be useful if I could define wildcards local to a rule. Does such a thing exist, or is there a better way to solve this issue?

Upvotes: 2

Views: 261

Answers (1)

jafors
jafors

Reputation: 326

I hope I understood your problem correctly, if not, feel free to correct me:

So, you want to call the rule blast for every "blast_input_{X}.fasta"? Then, the batch wildcard would need to be carried over into the output.

rule blast:
    input:"blast_input_{X}.fasta",
    output:"output_blast_{X}.txt"
    script:"scripts/blastx.sh"

If you then later want to merge the batches again in another rule, just use expand in the input of that rule.

input: expand("output_blast_{X}.txt", X=your_batches)
output: "merged_blast_output.txt"

Upvotes: 3

Related Questions