Kim
Kim

Reputation: 11

Is snakemake the right tool to use for handling output mediated workflows

I'm new to trying out snakemake (last week or so) in order to handle less of the small details for workflows, previously I have coded up my own specific workflow through python.

I generated a small workflow which among the steps would use Illumina PE reads and ran Kraken against them. I'd then parse the output of the Kraken output to detect the most common species (within a set of allowable) if a species value wasn't provided (running with snakemake -s test.snake --config R1_reads= R2_reads= species=''.

I have 2 questions.

  1. What is the recommended approach given the dynamic output/input?

Currently my strategy for this is to create a temp file which contains the detected species and then cat {input.species} it into other shell commands. This doesn't seem elegant but looking through the docs I couldn't quite find an adequate alternative. I noticed PersistentDicts would let me pass variables between run: commands but I'm unsure if I can use that to load variables into a shell: section. I also noticed that wrappers could allow me to handle it however from the point I need that variable on I'd be wrapping the remainder of my workflow.

  1. Is snakemake the right tool if I want to use the species afterwards to run a set of scripts specific to the species (with multiple species specific workflows)?

Right now my impression on how to solve this problem is to have multiple workflow files for the species and have a run with switch which calls the associated species workflow dependant on the species.

Appreciate any insight on these questions.

-Kim

Upvotes: 1

Views: 381

Answers (1)

Johannes Köster
Johannes Köster

Reputation: 1927

You can mark output as dynamic (e.g. expecting one file per species). Then, Snakemake will determine the downstream DAG of jobs after those files have been generated. See http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#dynamic-files

Upvotes: 0

Related Questions