donkey
donkey

Reputation: 1448

Parallelizing a checkpoint calls

At the moment my snakemake workflow has a function which generates all the final output paths. This function uses two checkpoints:

def make_all_paths(wc):
    contigs = get_contigs(wc.genome)  # This function also calls a checkpoint, no loop
    for c in contigs:
        tsv_path = checkpoints.find_motifs_contig.get(genome_name=genome, contig=c).output[0]

        # parse that .tsv
        motifs = []
        try:
            open(tsv_path) and process, append results to motifs.

        except FileNotFoundError as e:  # Required to bypass known bug
        from snakemake.exceptions import IncompleteCheckpointException
        raise IncompleteCheckpointException(rules.find_motifs_contig, e.filename)

    return ... [list of file paths dependent on motifs and contigs]


rule make_all:
    input:
       make_all_paths
    output:
        touch(f"{genome_name}.sentinel")

rule all:
    expand("{genome_name}.sentinel, expand=GENOME_NAMES)

My issue is that this limits parallelization, the checkpoint has to be called, results returned, loop completed and then called again at the next iteration, etc.

I would like to modify this in a way that snakemake can parallelize those calls. Any recommendations?

Upvotes: 0

Views: 15

Answers (0)

Related Questions