Kallas
Kallas

Reputation: 89

Snakemake with a script generating many intermediate files

I want to know what is the best practice to write a rule(s) for a script with many intermediate files?

A R script sample is like this:


data <- read_rds(snakemake@input[["data"]])

# generate and save many plots for sanity check
plt1
ggsave("plt1.pdf", plt1) 

plt2
ggsave("plt2.pdf", plt2)

# and many other figs

# No actual output

The rule I wrote:

rule transform:
  input:
    data : "data.rds"
  output:
    touch("script.Rout")
  script: "script.R"

Notice that there is no actual output from script.R. This file is mainy used as some sanity check (manually done after running this rule) by plotting some draft figures. The figures will not be used as any inputs for future workflows.

In this case, is my solution proper? Are there better approaches?

Thank you!

Upvotes: 1

Views: 314

Answers (2)

Cornelius Roemer
Cornelius Roemer

Reputation: 7819

A Snakemake rule doesn't actually need to declare any output, so you could simply leave out the output bit.

In general, it's good to be explicit about the output of your rules for the following reasons:

  • Snakemake checks whether the output is actually produced and provides an informative error message when the output is not produced after the rule finished.
  • Snakemake can delete all output that is produced (--delete-all-output) but only if the output has been declared.

Another reason to declare and create output (like through touch as you do) is so that the rule can automatically be called by another rule. Say you have a "rule all" that triggers your actual data analysis, but you want it to create plots, too. You could then declare script.Rout as an input of this rule, and the script would automatically be called.

Since script.Rout is an empty file, declared potentially just so that another file can automatically execute this rule, you could add a temp to it, so that it gets removed in the end and you don't have to clean it up yourself:

output: temp(touch("script.Rout"))

Upvotes: 3

Troy Comi
Troy Comi

Reputation: 2059

Since you don't use the files for anything else, it should be ok.

There is an argument for explicitly listing the outputs though. That would provide support for removing all output files through snakemake and would detect cases where your script failed silently halfway through. If the outputs are really figure{i}.pdf, it's as easy as adding an expand('figure{i}.pdf', i=range(MAX).

Upvotes: 2

Related Questions