Reputation: 89
I want to know what is the best practice to write a rule(s) for a script with many intermediate files?
A R script sample is like this:
data <- read_rds(snakemake@input[["data"]])
# generate and save many plots for sanity check
plt1
ggsave("plt1.pdf", plt1)
plt2
ggsave("plt2.pdf", plt2)
# and many other figs
# No actual output
The rule I wrote:
rule transform:
input:
data : "data.rds"
output:
touch("script.Rout")
script: "script.R"
Notice that there is no actual output from script.R
. This file is mainy used as some sanity check (manually done after running this rule) by plotting some draft figures. The figures will not be used as any inputs for future workflows.
In this case, is my solution proper? Are there better approaches?
Thank you!
Upvotes: 1
Views: 314
Reputation: 7819
A Snakemake rule doesn't actually need to declare any output, so you could simply leave out the output bit.
In general, it's good to be explicit about the output of your rules for the following reasons:
--delete-all-output
) but only if the output has been declared.Another reason to declare and create output (like through touch
as you do) is so that the rule can automatically be called by another rule. Say you have a "rule all" that triggers your actual data analysis, but you want it to create plots, too. You could then declare script.Rout
as an input of this rule, and the script would automatically be called.
Since script.Rout
is an empty file, declared potentially just so that another file can automatically execute this rule, you could add a temp
to it, so that it gets removed in the end and you don't have to clean it up yourself:
output: temp(touch("script.Rout"))
Upvotes: 3
Reputation: 2059
Since you don't use the files for anything else, it should be ok.
There is an argument for explicitly listing the outputs though. That would provide support for removing all output files through snakemake and would detect cases where your script failed silently halfway through. If the outputs are really figure{i}.pdf
, it's as easy as adding an expand('figure{i}.pdf', i=range(MAX)
.
Upvotes: 2