soungalo
soungalo

Reputation: 1338

Snakemake - delete all non-output files produced by a workflow

I have a workflow that produces tons of files, most of them are not the output of any rule (they are intermediate results). I'd like to have the option of deleting everything that is not the output of any rule after the workflow is complete. This would be useful for archiving.
Right now the only way I found to do that is to define all outputs of all rules as protected, and then run snakemake --delete-all-output. Two questions:
1. Is this the way to go, or is there a better solution?
2. Is there a way to automatically define all outputs as protected, or do I have to go through the entire code and wrap all outputs with protected()?

Thanks!

Upvotes: 3

Views: 1611

Answers (2)

MSR
MSR

Reputation: 2881

In addition to @dariober's suggestion, here's a few ideas:

  • It sounds like you know this already, but you could wrap unneeded output in temp(), which will cause Snakemake to delete it automatically. You can combine this with --notemp for debugging. With temp(), deletion will happen progressively, not after the workflow is complete.
  • Another option may be to use the onsuccess hook defined by snakemake. From the docs, "The onsuccess handler is executed if the workflow finished without error." So, say, if throughout the workflow, you put unneeded file in a temp/ folder or similar, you could use shutil.rmtree("temp") in onsuccess, which would delete all your unneeded files only after the workflow finished successfully, as you require. (Note also the similar onerror, should you need it.)

Upvotes: 2

dariober
dariober

Reputation: 9062

Maybe the option --list-untracked helps?

  --list-untracked, --lu
                        List all files in the working directory that are not
                        used in the workflow. This can be used e.g. for
                        identifying leftover files. Hidden files and
                        directories are ignored.

Upvotes: 3

Related Questions