hepcat72
hepcat72

Reputation: 1124

How to set resources for snakemake rules running on a SLURM cluster?

I've been embarking on my first foray into using snakemake and have been steadily succeeding in creating a now 108 step workflow that runs fine on mac and linux workstations, but now I would like to try running larger analyses on a SLURM cluster. Another person on our team got a profile working for our cluster and it runs fine on our small test data.

However, now I'm trying to pipe some real, large data through the workflow and I am encountering resource limit errors. The first one is for a tool called samtools sort. It ran out of memory and according to the log, it had requested 1000mb total for a 12 thread job, so I read in the docs that something like this would allow me to request more memory for that specific run, based on the number of CPUs requested:

    resources:
        mem_mb_per_cpu: 1500

However, when I try to run it, it tells me that the line containing mem_mb_per_cpu had invalid syntax. I tried modifying it in several ways (despite it diverging from the example in the docs:

    resources:
        partition: <partition name>
        runtime: <some number>

The entire rule was:

rule sort_atac_alignments:
    input:
        "results/atac_alignments/{sample_id}.bam",
    output:
        "results/sorted_atac_alignments/{sample_id}.bam",
    threads: 12
    params:
        smt_threads=lambda w, threads: threads - 1,
    log:
        "results/sorted_atac_alignments/logs/{sample_id}_sort.log",
    conda:
        "../envs/samtools.yml"
    resources:
        mem_mb_per_cpu: 1500
    shell:
        "samtools sort --threads {params.smt_threads} {input:q} -o {output:q} 2> {log:q}"

So then I decided to remove the resources settings I'd added to the rule and tried specifying the memory on the command line by adding this to my command:

--set-resources sort_atac_alignments:mem_mb_per_cpu=1500

And that worked! (I did get a warning about not having set a walltime, which I didn't get on the first run where I ran into my first memory error.) And now I'm encountering similar resource limitation issues in subsequent rules. But I don't want to have to specify them all on the command line every time. So what's the problem with the syntax when I'd added resourced to my rule?

I realize there is more than 1 way to specify resources for slurm runs and I may end up using a different strategy, but for my own sanity, I'd like to know what this syntax issue is.

Note, I'd tried a few other things, but I'll only mention them if necessary.

And here is the full and exact execution & error and the exact line in question that produced the error:

$ snakemake --use-conda --cores 12 --notemp --printshellcmds --profile /Genomics/argo/users/rleach/ATACCompendium/Profile_Name --slurm --directory /Genomics/biocomp/rleach/YURI/ATACC/CD4/cd4_keep
SyntaxError in file /Genomics/argo/users/rleach/ATACCompendium/workflow/rules/atac_align.smk, line 80:
invalid syntax
  File "/Genomics/argo/users/rleach/ATACCompendium/workflow/Snakefile", line 151, in <module>
$ head -n 81 /Genomics/argo/users/rleach/ATACCompendium/workflow/rules/atac_align.smk | tail -n 3
    resources:
        mem_mb_per_cpu: 1500
    shell:

Upvotes: 1

Views: 923

Answers (1)

hepcat72
hepcat72

Reputation: 1124

Sigh, the example on the docs page I linked is wrong. It links to more about resource specifications, where it shows a different example that uses = instead of ::

    resources:
        mem_mb=100

I made that change (mem_mb_per_cpu=1500), and it worked. I'd previously encountered a documentation bug that I submitted, but it hasn't gone anywhere, so I'm not sure it's worth the trouble of submitting another bug.

Upvotes: 3

Related Questions