PD Schloss
PD Schloss

Reputation: 149

Using wildcards in slurm resources directive with snakemake

I'm using snakemake to create rules and submit jobs on our HPC with slurm. To make the output "prettier", I would like to be able to set the job_name argument in the resources directive so that the wildcards being used are integrated into the job name.

For example...

datasets = ["bioethanol", "human", "lake"]

rule clean_data:
  input: 
    script="code/exmample.sh",
    data="data/{dataset}/input.txt"
  output:
    "data/{dataset}/output.txt",
  resources:
    job_name="{dataset}_clean_data"  
    cpus=8,
    mem_mb=45000,
    time_min=3000
  shell:
    """
    {input.script} {input.data}
    """

I have a config.yaml file that looks like this...

# cluster commands
cluster: "sbatch --job-name={resources.job_name}
          --account=my_account 
          --partition=standard 
          --nodes=1 
          --time={resources.time_min} 
          --mem={resources.mem_mb}
          -c {resources.cpus} 
          -o logs_slurm/%x_%j.out"

When I do this the three jobs that are created are all called {dataset}_clean_data without inserting the actual dataset names in place of {dataset}. Is there a way to get the job names to instead be called bioethanol_clean_data, human_clean_data, lake_clean_data?

Upvotes: 0

Views: 820

Answers (1)

Troy Comi
Troy Comi

Reputation: 2079

In your resource directive, you need to use an input function:

  resources:
    job_name=lambda wildcards: f"{wildcards.dataset}_clean_data"  

You may run into some issues with having to remember to include that in every resource. Some other options include:

  • Make slurm outputs go to the snakemake log file, which includes wildcards.
  • Change the name of each rule to include wildcards (have not tested)
  • Use wildcards in the cluster submission script --job-name={rule}_{wildcards}

I would lean towards using logs, but do checkout using wildcards in the job name.

Upvotes: 1

Related Questions