nhaus
nhaus

Reputation: 1023

Snakemake priorities that one sample finishes before next starts

I am using a rather large snakemake file to call mutations for 30 patients. The first step of the workflow is the alignment. The problem that I am running into is, that the snakemake scheduler seems to perform the first step (i.e. the alignment) first for all 30 patients. This in turn requires a huge amount of (temporarily) disk space (>>10TB). This is quite ineffective because when the workflow finishes, one patient only takes up less than 1 GB (only vcf files).

So my question is if there is a way to "force" snakemake to finish processing one patient before starting the alignment (the first step) for a new patient, while still parallizing everything.

I tried the --prioritize option to prioritize the last rule of the workflow, but that did not seem to do the trick.

Any help is much appreciated!

Cheers!

Upvotes: 1

Views: 531

Answers (1)

dariober
dariober

Reputation: 9062

This in turn requires a huge amount of (temporarily) disk space

I think you can set the disk_mb resource in such way that snakemake will not exceed it.

For example, you have 100 GB of disk space and each alignment takes (at most) 30 GB, the following should constraint snakemake to run at most 3 alignments at the same time (and assuming the next steps require negligible space - edit as required):

rule align:
    input: 
        ...
    output: 
        ...
    resources:
        disk_mb=30000
    ...

Run as:

snakemake --resources disk_mb=100000 ...

The answer at Snakemake: Tranverse DAG depth-first? should work but then you will have to run 1 job at a time even when jobs need little disk space.

Upvotes: 2

Related Questions