Masih
Masih

Reputation: 980

Set cluster core per rule in Snakemake

I need to download hundreds of large files, and run each of them through my snakemake pipeline. The file download is fast compared to my downstream pipeline. I'd like to limit the number of parallel downloads to 5, but allow the downsteam processing to use 100 cores. In snakemake, is there a way to limit the number of cores used by a certain rule? I picture 5 cores constantly grabbing data, while my other cores are working on the data I've already downloaded. If I run snakemake as usual with 100 cores, it will try to download all files at once, and overload the server. I already tried to do it by adding 'threads:1' to the rule but It does not work as it is supposed to. I suppose that by adding 'threads:1' to the rule, it should return the same results as when '-j 1' option is used in the command line for that rule, but they return different results.

Upvotes: 1

Views: 1145

Answers (1)

Foldager
Foldager

Reputation: 519

You can use resources to limit how many rules can run in parallel. You can name the resources as you please, see resources documentation. Here is an example using the name download_streams.

Snakefile:

rule r1:
    output: touch("{field}.txt")
    resources: download_streams=1
    shell:
        "sleep 2; "
        "echo $(date '+%H:%M:%S') Finished downloading {output}"

Running snakemake download_{1..10}.txt --resources download_streams=2 -j 10 > log.txt gives the following in log.txt

12:00:58 Finished downloading download_1.txt
12:00:58 Finished downloading download_5.txt
12:01:00 Finished downloading download_6.txt
12:01:00 Finished downloading download_8.txt
12:01:02 Finished downloading download_9.txt
12:01:02 Finished downloading download_10.txt
12:01:04 Finished downloading download_3.txt
12:01:04 Finished downloading download_2.txt
12:01:06 Finished downloading download_4.txt
12:01:06 Finished downloading download_7.txt

Upvotes: 1

Related Questions