Reputation: 41
Context
I am running a snakemake (v.7.32.4) pipeline using slurm task manager. I have set resources (time and memory for each rule) dynamically based on file size and # of tries, like this:
rule index:
resources:
mem_mb = lambda wildcards, input, attempt: (
200 * attempt
),
runtime = lambda wildcards, input, attempt: (
"{minutes}min".format(
minutes=max(
int((input.size_mb / 5000) * attempt),
1)
)
)
I have 2 related questions (should I split the post?):
1) Is it possible to set resources dinamically outside of snakefile?
I tried to set that on the profile config file but didn't success (sometime ago, so cannot say exactly what I tried)
2) Having set resources dinamically inside Snakefile, how do I do a dry run or a rulegraph?
If I run a dry-run I get the following error:
WorkflowError:
Cannot parse runtime value into minutes for setting runtime resource: <TBD>
This seems logical to me since the file doesn't exist yet. Nevertheless, I would like to know the specifics of all steps (except resources of course) to be run before actually running them, is this possible?
Something similar happens if I try to do a rulegraph:
snakemake --profile Config/Profiles/slurm -np --rulegraph | dot -Tsvg > rulegraph.svg
In this case, I get an empty file (probably because of the error in the dry run?).
Upvotes: 3
Views: 405
Reputation: 41
Depends on the snakemake version. For v8.14.0, yes. For version 7.32.4, sort of. See both scenarios below.
You can call a function on snakefile and declare the function elsewhere, as proposed by @SultanOrazbayev:
Snakefile:
from resources import get_runtime
rule index:
resources:
mem_mb = get_mem_mb
resources.py:
def get_runtime(wildcards: dict, input: str|list[str], attempt: int) -> str:
try:
minutes = max(int((input.size_mb / 5_000) * attempt), 1)
except FileNotFoundError:
minutes = 10 # this is some test value
return "{minutes}min"
Install the slurm
plug in:
pip install snakemake-executor-plugin-slurm
Then you can specify resources dinamically (entirely) on your workflow profile config.yaml
file:
executor: slurm
set-resources:
index:
runtime: f"{max(int((input.size_mb / 5_000) * attempt), 1)}min"
NOTE: I only tried snakemake v8 using the slurm
plugin. And I found this solution on the slurm
plugin documentation. Hence, I don't know if using the workflow profile as described above would work without the slurm
plugin or not.
This question only applies to snakemake version 7 or lower. On snakemake v8 (I specifically tried v8.14.0) dry runs and graphs works fine, even if file does not exist yet.
As for snakemake v7.32.4, one way of solving it is by handling the error as described above (proposed by @SultanOrazbayev).
Upvotes: 1
Reputation: 16581
This might not work best for your needs, but here's one approach. Create a script containing the functions that will determine the resources (addressing question 1), something like this:
def get_mem_mb(wildcards: dict, input: str|list[str], attempt: int) -> int:
# any complex logic
mem_mb = 200 * attempt
return mem_mb
def get_runtime(wildcards: dict, input: str|list[str], attempt: int) -> str:
try:
minutes = max(int((input.size_mb / 5_000) * attempt), 1)
except Exception:
minutes = 10 # this is some test value
# ideally, you'd want to figure out the actual error
# and make a more narrow except clause
return "{minutes}min"
Note that the function get_runtime
includes a case when the input may not have the right properties, covering the dry-run situation (question 2). There is scope to adjust this further, of course, to adapt to your specific use case.
The Snakefile would look like this:
from resources import get_mem_mb, get_runtime
rule index:
resources:
mem_mb = get_mem_mb,
runtime = get_runtime
Upvotes: 1