Snakemake: variable that defines whether process is submitted cluster job or the snakefile

Question

My current architecture is that at the start of my Snakefile I have a long running function somefunc which helps decide the "input" to rule all. I realized when I was running the workflow with slurm that somefunc is being executed by each job. Is there some variable I can access that defines whether the code is a submitted job or whether it is the main process:

if not snakemake.submitted_job:
    config['layout'] = somefunc()

...

Maarten-vd-Sande · Accepted Answer

As discussed with @dariober it seems the cleanest to check whether the (hidden) snakemake directory has locks since they seem not to be generated until the first rule starts (assuming you are not using the --nolock argument).

import os
locked = len(os.listdir(".snakemake/locks")) > 0

However this results in a problem in my case:

import time
import os


def longfunc():
    time.sleep(10)
    return range(5)

locked = len(os.listdir(".snakemake/locks")) > 0
if not locked:
    info = longfunc()


rule all:
    input:
        expand("test_{sample}", sample=info)



rule test:
    output:
        touch("test_{sample}")
    run:
        """
        sleep 1
        """

Somehow snakemake lets each rule reinterpret the complete snakefile, with the issue that all the jobs will complain that 'info is not defined'. For me it was easiest to store the results and load them for each job (pickle.dump and pickle.load).

Snakemake: variable that defines whether process is submitted cluster job or the snakefile

Answers (2)

Related Questions