Reputation: 634
In Snakemake, conda environments can be easily set up by defining rules as such conda: "envs/my_environment.yaml"
. This way, YAML files specify which packages to install prior to running the pipeline.
Some software requires a path to third-party-software, to execute specific commands.
An example of this is when generating a reference index with RSEM (example from GitHub page DeweyLab - RSEM):
rsem-prepare-reference --gtf mm9.gtf \
--star \
--star-path /sw/STAR \
-p 8 \
--prep-pRSEM \
--bowtie-path /sw/bowtie \
--mappability-bigwig-file /data/mm9.bigWig \
/data/mm9 \
/ref/mouse_0
Can I locate or predefine the directory (e.g. [workdir]/.snakemake/conda/STAR
) for the STAR
aligner software, which is installed via conda in a prior rule?
Currently, one option may be to create a shared environment folder, using the Command-line interface option: --conda-prefix
Snakemake docs - Command-line interface, however as this is a single-case-issue, I would prefer to define this information in the rules.
Upvotes: 1
Views: 3535
Reputation: 3701
I would like to add a third option to @merv's answer. You could use which
to dynamically figure out the path (assuming it is enabled on your system):
rsem-prepare-reference --star-path $(which star) ...
Upvotes: 2
Reputation: 76760
There are two ways that I've dealt with this.
That specific option (--star-path
) only needs to be specified if STAR is not on PATH. However, if STAR is included in your YAML for this rule, then Conda will place it on PATH as part of the environment activation, and so that option won't be needed. Same goes for --bowtie-path
. Hence, for such a rule the YAML might be something like:
name: rsem
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- rsem
- star
- bowtie
As per this thread, consider fixing the versions on the packages up to a minor version (e.g., bowtie=1.3
).
config.yaml
for Pipeline OptionsIf for some reason you don't want a fully self-contained pipeline, e.g., your system already has lots of standard genomics software like STAR preinstalled, then you could include an entries in your config.yaml
where users should adjust the pipeline to their system. For example, here are the relevant parts:
config.yaml
star_path: /sw/STAR
bowtie_path: /sw/bowtie
Snakefile
configfile: config.yaml
## this is not a complete rule
rule rsem_prep_ref:
# needs input, output...
params:
star=config['star_path'],
bowtie=config['bowtie_path']
threads: 8
conda: "envs/myenv.yaml"
shell:
"""
rsem-prepare-reference --gtf mm9.gtf \
--star \
--star-path {params.star} \
-p {threads} \
--prep-pRSEM \
--bowtie-path {params.bowtie} \
--mappability-bigwig-file /data/mm9.bigWig \
/data/mm9 \
/ref/mouse_0
"""
Really, anything your pipeline assumes already exists and is not generated by the pipeline itself should go into your config.yaml
(e.g., mm9.gtf
or mm9.bigWig
).
Generally, I advise against trying to share environments. However, you can still conserve space by sharing a package cache across users and making sure environments are created on the same filesystem (this lets Conda use hardlinks instead of copying). You can use the Conda configuration option pkgs_dirs
to set package cache locations. If the pipeline itself is already on the same file system as the Conda package cache, I would just let Snakemake use the default location (.snakemake/conda
) and not mess with the --conda-prefix
argument.
Otherwise, you can give Snakemake the --conda-prefix
argument to point to a directory on the same file system in which to create Conda environments. This should be a rather generic directory in which all environments for the pipeline get located. What was proposed in OP ([workdir]/.snakemake/conda/STAR
) would not make sense.
Upvotes: 2