Reputation: 1448
At the moment my snakemake workflow has a function which generates all the final output paths. This function uses two checkpoints:
def make_all_paths(wc):
contigs = get_contigs(wc.genome) # This function also calls a checkpoint, no loop
for c in contigs:
tsv_path = checkpoints.find_motifs_contig.get(genome_name=genome, contig=c).output[0]
# parse that .tsv
motifs = []
try:
open(tsv_path) and process, append results to motifs.
except FileNotFoundError as e: # Required to bypass known bug
from snakemake.exceptions import IncompleteCheckpointException
raise IncompleteCheckpointException(rules.find_motifs_contig, e.filename)
return ... [list of file paths dependent on motifs and contigs]
rule make_all:
input:
make_all_paths
output:
touch(f"{genome_name}.sentinel")
rule all:
expand("{genome_name}.sentinel, expand=GENOME_NAMES)
My issue is that this limits parallelization, the checkpoint has to be called, results returned, loop completed and then called again at the next iteration, etc.
I would like to modify this in a way that snakemake can parallelize those calls. Any recommendations?
Upvotes: 0
Views: 15