Agustin Carbajal
Agustin Carbajal

Reputation: 41

Setting resources dynamically on snakemake

Context
I am running a snakemake (v.7.32.4) pipeline using slurm task manager. I have set resources (time and memory for each rule) dynamically based on file size and # of tries, like this:

rule index:
    resources:
        mem_mb = lambda wildcards, input, attempt: (
            200 * attempt
        ),
        runtime = lambda wildcards, input, attempt: (
            "{minutes}min".format(
                minutes=max(
                    int((input.size_mb / 5000) * attempt),
                    1)
            )
        )

I have 2 related questions (should I split the post?):
1) Is it possible to set resources dinamically outside of snakefile? I tried to set that on the profile config file but didn't success (sometime ago, so cannot say exactly what I tried)

2) Having set resources dinamically inside Snakefile, how do I do a dry run or a rulegraph?
If I run a dry-run I get the following error:

WorkflowError:
Cannot parse runtime value into minutes for setting runtime resource: <TBD>

This seems logical to me since the file doesn't exist yet. Nevertheless, I would like to know the specifics of all steps (except resources of course) to be run before actually running them, is this possible?
Something similar happens if I try to do a rulegraph:

snakemake --profile Config/Profiles/slurm -np --rulegraph | dot -Tsvg > rulegraph.svg

In this case, I get an empty file (probably because of the error in the dry run?).

Upvotes: 3

Views: 405

Answers (2)

Agustin Carbajal
Agustin Carbajal

Reputation: 41

1) Is it possible to set resources dinamically outside of snakefile?

Depends on the snakemake version. For v8.14.0, yes. For version 7.32.4, sort of. See both scenarios below.

Snakemake version 7.34.4

You can call a function on snakefile and declare the function elsewhere, as proposed by @SultanOrazbayev:
Snakefile:

from resources import get_runtime

rule index:  
    resources:  
        mem_mb = get_mem_mb    

resources.py:

def get_runtime(wildcards: dict, input: str|list[str], attempt: int) -> str:
   try:
      minutes = max(int((input.size_mb / 5_000) * attempt), 1)
   except FileNotFoundError:
      minutes = 10  # this is some test value
   return  "{minutes}min"

Snakemake version 8.14.0

Install the slurm plug in:

pip install snakemake-executor-plugin-slurm

Then you can specify resources dinamically (entirely) on your workflow profile config.yaml file:

executor: slurm  
set-resources:
    index:
      runtime: f"{max(int((input.size_mb / 5_000) * attempt), 1)}min"

NOTE: I only tried snakemake v8 using the slurm plugin. And I found this solution on the slurm plugin documentation. Hence, I don't know if using the workflow profile as described above would work without the slurm plugin or not.

2) Having set resources dynamically inside Snakefile, how do I do a dry run or a rulegraph?

This question only applies to snakemake version 7 or lower. On snakemake v8 (I specifically tried v8.14.0) dry runs and graphs works fine, even if file does not exist yet.
As for snakemake v7.32.4, one way of solving it is by handling the error as described above (proposed by @SultanOrazbayev).

Upvotes: 1

SultanOrazbayev
SultanOrazbayev

Reputation: 16581

This might not work best for your needs, but here's one approach. Create a script containing the functions that will determine the resources (addressing question 1), something like this:

def get_mem_mb(wildcards: dict, input: str|list[str], attempt: int) -> int:
   # any complex logic
   mem_mb = 200 * attempt
   return mem_mb

def get_runtime(wildcards: dict, input: str|list[str], attempt: int) -> str:
   try:
      minutes = max(int((input.size_mb / 5_000) * attempt), 1)
   except Exception:
      minutes = 10  # this is some test value
      # ideally, you'd want to figure out the actual error
      # and make a more narrow except clause
   return  "{minutes}min"

Note that the function get_runtime includes a case when the input may not have the right properties, covering the dry-run situation (question 2). There is scope to adjust this further, of course, to adapt to your specific use case.

The Snakefile would look like this:

from resources import get_mem_mb, get_runtime

rule index:
    resources:
        mem_mb = get_mem_mb,
        runtime = get_runtime

Upvotes: 1

Related Questions