Multiple named inputs in Snakefile

Question

I want to make a pipeline that looks like this:

For each dataset extract some features
Make a unique list of all features
Extract the unique list from all the original datasets.

Here is a basic example of where I am

input_dict = {"data1": "/path/to/data1", "data2": "/path/to/data2"}

rule all:
    input: 
        expand('data/{dataset}.processed', dataset=input_dict.keys())

rule extract_master:
    output:
        'data/{dataset}.processed'
    input:
        master = rules.master_list.output, dataset = lambda wildcards: input_dict[wildcards.dataset]
    shell:
        "./extract_master.py --input {input.dataset} --out {output} --master {input.master}"

rule master_list:
    output:
        'data/master.txt'
    input:
        expand('data/{dataset}.chunk', dataset=input_dict.keys())
    shell:
        './master_list.py --input {input} --output {output}'

rule get_chunk:
    input:
        lambda wildcards: input_dict[wildcards.dataset]
    output:
        'data/{dataset}.chunk'
    shell:
        "./get_chunk.py --input {input} --output {output}"

I get an error:

'Rules' object has no attribute 'master_list'

I don't know how to specify two named inputs, where each input is not a simple string. If there is syntax I can use for the input section in the extract_master rule to fix this, that would be great. Otherwise, any thoughts on a better approach would be gladly received.

Manavalan Gajapathy · Accepted Answer

Importantly, be aware that referring to rule a here requires that rule a was defined above rule b in the file, since the object has to be known already. This feature also allows to resolve dependencies that are ambiguous when using filenames.

Source

That is, in your example, rule master_list should be defined before rule extract_master.

Multiple named inputs in Snakefile

Answers (1)

Related Questions