Reputation: 177
I wrote my first snakemake rule that uses a python script for processing files:
rule sanitize_labels:
input:
"data/raw/labels/rois_essence_31_10_2019_final.shp",
"data/raw/labels/pts_carte_auto_final.shp"
output:
"data/interim/labels/rois_essence_31_10_2019_final.csv",
"data/interim/labels/pts_carte_auto_final.csv"
params:
crs = 32189,
log = True
script:
"../../scripts/data/sanitize_labels.py"
It runs successfully for the first file, than stops with this message:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 9 of E:\code\projects\essences\workflow\rules\pre
processing.smk:
Missing files after 5 seconds:
data/interim/labels/pts_carte_auto_final.csv
This might be due to filesystem latency. If that is the case, consider to increa
se the wait time with --latency-wait.
Removing output files of failed job sanitize_labels since they might be corrupte
d:
data/interim/labels/rois_essence_31_10_2019_final.csv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: E:\code\projects\essences\.snakemake\log\2020-02-10T025157.458955.
snakemake.log
I tried swapping file order both in input and output; always only the first file gets processed.
In my python script, I refer to input and output as snakemake.input[0] and snakemake.output[0]. If I understand correctly snakemake.input[0] is assigned the current input in each call of the script (no matter what's the number of inputs in the rule). Same goes for snakemake.output[0]. Is that correct? Do you have other hints at what can cause this error?
I'm running snakemake version 5.10.0 (installed as snakemake-minimal from bioconda channel.).
Thanks a lot for any hint.
Upvotes: 0
Views: 630
Reputation: 2069
Adding to Maarten's answer, once you have specified the generic rule he provided, you then request the final outputs you want as the rule 'all' as your first rule:
rule all:
input: expand("data/interim/labels/{name}.csv", name=DATASETS)
If you place the input directive in your generic sanitize_labels rule, it is no longer generic. Snakemake expands the input you provide to create the same rule as in your question.
Go through the tutorial again if it's still not clear. While you may think and write your rules from start to finish, snakemake evaluates from finish to start. You request the final outputs in all (as inputs) and snakemake decides what needs to be run. It's confusing at first, but just remember to request your final output in all and keep your rules generic.
Upvotes: 1
Reputation: 3701
I think you need to take a look again at the "idea" behind snakemake. Probably what you need is something like this:
rule sanitize_labels:
input:
"data/raw/labels/{name}.shp"
output:
"data/interim/labels/{name}.csv"
params:
crs = 32189,
log = True
script:
"../../scripts/data/sanitize_labels.py"
Where you do not exactly specify the filename, but you tell snakemake how it can generate a certain output from a certain input. In this case, if your script needs both data/interim/labels/rois_essence_31_10_2019_final.csv
and data/interim/labels/pts_carte_auto_final.csv
, Snakemake "understands" how to make these files, and it knows which inputs it needs.
Upvotes: 1