abukaj
abukaj

Reputation: 2712

Preprocessing for wildcarded Snakemake rules

I have a Snakemake recipe which contains a very expensive preparatory step, common for all its calls. Here is a pseudorule for demonstration sake:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    run:
        import somemodule
        
        data = somemodule.Loader("some_big_data")  # expensive
        np.savez(output, data.process(input))  # also expensive

At the moment data is loaded de novo for every target, which is pretty suboptimal. How can I make it to be loaded only once?

I look for something which allows to rewrite the rule like that:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    setup:
        import somemodule
        
        data = somemodule.Loader("some_big_data")  # expensive
    run:
        np.savez(output, data.process(input))  # also expensive

or:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    run:
        import somemodule

        data = somemodule.Loader("some_big_data")  # expensive
        
        for job in jobs:
            np.savez(job.output,
                     data.process(job.input))  # also expensive

In another question I have described the code Loader.__init__() is based on.

Upvotes: 5

Views: 103

Answers (1)

SultanOrazbayev
SultanOrazbayev

Reputation: 16561

One possible solution is to create a pickled object with the data of interest. Please research the security considerations of using pickled objects to check that it is acceptable for your case. If it is, then it would be along the following lines:

rule sample:
    input:
        "{name}.config"
    output:
        pickle = "{name}.pickle",
    run:
        import somemodule
        import pickle
        
        data = somemodule.Loader("some_big_data")  # expensive
        pickle.dump(pickle, output.pickle)

In downstream rules you would reference the pickled file like any other file, just making sure to load it with pickle.load.

Upvotes: 1

Related Questions