Reputation: 2712
I have a Snakemake recipe which contains a very expensive preparatory step, common for all its calls. Here is a pseudorule for demonstration sake:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
run:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
np.savez(output, data.process(input)) # also expensive
At the moment data
is loaded de novo for every target, which is pretty suboptimal. How can I make it to be loaded only once?
I look for something which allows to rewrite the rule like that:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
setup:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
run:
np.savez(output, data.process(input)) # also expensive
or:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
run:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
for job in jobs:
np.savez(job.output,
data.process(job.input)) # also expensive
In another question I have described the code Loader.__init__()
is based on.
Upvotes: 5
Views: 103
Reputation: 16561
One possible solution is to create a pickled object with the data of interest. Please research the security considerations of using pickled objects to check that it is acceptable for your case. If it is, then it would be along the following lines:
rule sample:
input:
"{name}.config"
output:
pickle = "{name}.pickle",
run:
import somemodule
import pickle
data = somemodule.Loader("some_big_data") # expensive
pickle.dump(pickle, output.pickle)
In downstream rules you would reference the pickled file like any other file, just making sure to load it with pickle.load
.
Upvotes: 1