Reputation: 740
I have a large snakemake file that looks like this (after simplifying a lot).
rule a:
input: '{path}.csv'
output: '{path}.a.csv'
shell: 'cp {input} {output}'
rule b:
input: '{path}.csv'
output: '{path}.b.csv'
shell: 'cp {input} {output}'
rule c:
input: '{path}.csv'
output: '{path}.c.csv'
shell: 'cp {input} {output}'
rule d:
input: '{path}.csv'
output: '{path}.d.csv'
shell: 'cp {input} {output}'
rule all:
input: 'raw1.a.b.c.a.d.csv',
'raw2.a.b.c.d.a.csv'
(This setup lets me use rules like functions, by chaining their filename suffixes in the all
rule.)
Starting state:
$ ls -tr1
Snakefile
raw1.csv
raw2.csv
$ snakemake all
...
After:
$ ls -tr1
Snakefile
raw1.csv
raw2.csv
raw2.a.csv
raw2.a.b.csv
raw2.a.b.c.csv
raw2.a.b.c.d.csv
raw1.a.csv
raw1.a.b.csv
raw1.a.b.c.csv
raw1.a.b.c.a.csv
raw1.a.b.c.a.d.csv
raw2.a.b.c.d.a.csv
Now, I'd like to add a rule that deletes specific intermediate files (for example raw1.a.csv
and raw2.a.b.csv
) because I don't need them and they take up a lot of disk space. I can't mark the outputs with temp()
because of the wildcard {path}
.
Any tips? Thanks.
Upvotes: 0
Views: 1104
Reputation: 740
EDIT: Actually, this solution doesn't work.. it results in race condition...
Ok, I figured it out...
rule a:
input: '{path}.csv'
output: '{path}.a.csv'
shell: 'cp {input} {output}'
rule b:
input: '{path}.csv'
output: '{path}.b.csv'
shell: 'cp {input} {output}'
rule c:
input: '{path}.csv'
output: '{path}.c.csv'
shell: 'cp {input} {output}'
rule d:
input: '{path}.csv'
output: '{path}.d.csv'
shell: 'cp {input} {output}'
rule remove: # <-- rule to delete a file
input: '{path}'
output: touch('{path}.removed')
shell: 'rm {input}'
rule all:
input: 'raw1.a.b.c.a.d.csv',
'raw2.a.b.c.d.a.csv',
'raw1.a.csv.removed', # <-- specify which files to rm
'raw2.a.b.c.csv.removed', # <-- specify which files to rm
and here's the dag:
$ snakemake --dag all | dot -Tpng > dag.png
Upvotes: 0
Reputation: 4089
temp()
does work in this scenario.
rule all:
input: 'raw1.a.b.c.a.d.csv',
'raw2.a.b.c.d.a.csv'
rule a:
input: '{path}.csv'
output: temp('{path}.a.csv')
shell: 'cp {input} {output}'
rule b:
input: '{path}.csv'
output: '{path}.b.csv'
shell: 'cp {input} {output}'
rule c:
input: '{path}.csv'
output: temp('{path}.c.csv')
shell: 'cp {input} {output}'
rule d:
input: '{path}.csv'
output: '{path}.d.csv'
shell: 'cp {input} {output}'
Executing this would result in creation of files raw1.a.b.c.a.d.csv , raw1.a.b.csv, raw2.a.b.c.d.csv, raw2.a.b.csv
and auto-deletion of files raw1.a.csv, raw2.a.csv, raw1.a.b.c.csv, raw2.a.b.c.csv, raw1.a.b.c.a.csv, raw2.a.b.c.d.a.csv
.
Upvotes: 1