Reputation: 149
I'm using snakemake to create rules and submit jobs on our HPC with slurm. To make the output "prettier", I would like to be able to set the job_name
argument in the resources directive so that the wildcards being used are integrated into the job name.
For example...
datasets = ["bioethanol", "human", "lake"]
rule clean_data:
input:
script="code/exmample.sh",
data="data/{dataset}/input.txt"
output:
"data/{dataset}/output.txt",
resources:
job_name="{dataset}_clean_data"
cpus=8,
mem_mb=45000,
time_min=3000
shell:
"""
{input.script} {input.data}
"""
I have a config.yaml
file that looks like this...
# cluster commands
cluster: "sbatch --job-name={resources.job_name}
--account=my_account
--partition=standard
--nodes=1
--time={resources.time_min}
--mem={resources.mem_mb}
-c {resources.cpus}
-o logs_slurm/%x_%j.out"
When I do this the three jobs that are created are all called {dataset}_clean_data
without inserting the actual dataset names in place of {dataset}
. Is there a way to get the job names to instead be called bioethanol_clean_data
, human_clean_data
, lake_clean_data
?
Upvotes: 0
Views: 820
Reputation: 2079
In your resource directive, you need to use an input function:
resources:
job_name=lambda wildcards: f"{wildcards.dataset}_clean_data"
You may run into some issues with having to remember to include that in every resource. Some other options include:
name
of each rule to include wildcards (have not tested)wildcards
in the cluster submission script --job-name={rule}_{wildcards}
I would lean towards using logs, but do checkout using wildcards in the job name.
Upvotes: 1