Reputation: 77
I want to make a pipeline that looks like this:
Here is a basic example of where I am
input_dict = {"data1": "/path/to/data1", "data2": "/path/to/data2"}
rule all:
input:
expand('data/{dataset}.processed', dataset=input_dict.keys())
rule extract_master:
output:
'data/{dataset}.processed'
input:
master = rules.master_list.output, dataset = lambda wildcards: input_dict[wildcards.dataset]
shell:
"./extract_master.py --input {input.dataset} --out {output} --master {input.master}"
rule master_list:
output:
'data/master.txt'
input:
expand('data/{dataset}.chunk', dataset=input_dict.keys())
shell:
'./master_list.py --input {input} --output {output}'
rule get_chunk:
input:
lambda wildcards: input_dict[wildcards.dataset]
output:
'data/{dataset}.chunk'
shell:
"./get_chunk.py --input {input} --output {output}"
I get an error:
'Rules' object has no attribute 'master_list'
I don't know how to specify two named inputs, where each input is not a simple string. If there is syntax I can use for the input
section in the extract_master
rule to fix this, that would be great. Otherwise, any thoughts on a better approach would be gladly received.
Upvotes: 0
Views: 219
Reputation: 4089
Importantly, be aware that referring to rule a here requires that rule a was defined above rule b in the file, since the object has to be known already. This feature also allows to resolve dependencies that are ambiguous when using filenames.
That is, in your example, rule master_list
should be defined before rule extract_master
.
Upvotes: 2