Reputation: 1124
I've been embarking on my first foray into using snakemake and have been steadily succeeding in creating a now 108 step workflow that runs fine on mac and linux workstations, but now I would like to try running larger analyses on a SLURM cluster. Another person on our team got a profile working for our cluster and it runs fine on our small test data.
However, now I'm trying to pipe some real, large data through the workflow and I am encountering resource limit errors. The first one is for a tool called samtools sort
. It ran out of memory and according to the log, it had requested 1000mb total for a 12 thread job, so I read in the docs that something like this would allow me to request more memory for that specific run, based on the number of CPUs requested:
resources:
mem_mb_per_cpu: 1500
However, when I try to run it, it tells me that the line containing mem_mb_per_cpu
had invalid syntax. I tried modifying it in several ways (despite it diverging from the example in the docs:
resources:
partition: <partition name>
runtime: <some number>
The entire rule was:
rule sort_atac_alignments:
input:
"results/atac_alignments/{sample_id}.bam",
output:
"results/sorted_atac_alignments/{sample_id}.bam",
threads: 12
params:
smt_threads=lambda w, threads: threads - 1,
log:
"results/sorted_atac_alignments/logs/{sample_id}_sort.log",
conda:
"../envs/samtools.yml"
resources:
mem_mb_per_cpu: 1500
shell:
"samtools sort --threads {params.smt_threads} {input:q} -o {output:q} 2> {log:q}"
So then I decided to remove the resources settings I'd added to the rule and tried specifying the memory on the command line by adding this to my command:
--set-resources sort_atac_alignments:mem_mb_per_cpu=1500
And that worked! (I did get a warning about not having set a walltime, which I didn't get on the first run where I ran into my first memory error.) And now I'm encountering similar resource limitation issues in subsequent rules. But I don't want to have to specify them all on the command line every time. So what's the problem with the syntax when I'd added resourced to my rule?
I realize there is more than 1 way to specify resources for slurm runs and I may end up using a different strategy, but for my own sanity, I'd like to know what this syntax issue is.
Note, I'd tried a few other things, but I'll only mention them if necessary.
And here is the full and exact execution & error and the exact line in question that produced the error:
$ snakemake --use-conda --cores 12 --notemp --printshellcmds --profile /Genomics/argo/users/rleach/ATACCompendium/Profile_Name --slurm --directory /Genomics/biocomp/rleach/YURI/ATACC/CD4/cd4_keep
SyntaxError in file /Genomics/argo/users/rleach/ATACCompendium/workflow/rules/atac_align.smk, line 80:
invalid syntax
File "/Genomics/argo/users/rleach/ATACCompendium/workflow/Snakefile", line 151, in <module>
$ head -n 81 /Genomics/argo/users/rleach/ATACCompendium/workflow/rules/atac_align.smk | tail -n 3
resources:
mem_mb_per_cpu: 1500
shell:
Upvotes: 1
Views: 923
Reputation: 1124
Sigh, the example on the docs page I linked is wrong. It links to more about resource specifications, where it shows a different example that uses =
instead of :
:
resources:
mem_mb=100
I made that change (mem_mb_per_cpu=1500
), and it worked. I'd previously encountered a documentation bug that I submitted, but it hasn't gone anywhere, so I'm not sure it's worth the trouble of submitting another bug.
Upvotes: 3