Reputation: 157
I have done a simple pipeline where I take a couple of files, merge them (well not really, but let's pretend they are merged) into one whose filename is a simple combinations of the two (file1_file2.output) and perform some operations. The pipeline works perfectly if I manually provide the filenames for both file1 and file2, but what I really what to do is something like this:
Let's pretend I have 5 files A, B, C, D and E. I want to run the pipeline for those pairs: A-D, B-D and C-E and this is the Snakefile
rule all:
input:
expand("output/{file1}_{file2}.output")
rule Paste:
input:
F1="{file1}",
F2="{file2}"
output:
out="output/{file1}_{file2}.output"
shell:
"paste {input.F1} {input.F2} > {output.out}"
What is the best way to do so?
Upvotes: 2
Views: 921
Reputation: 3368
You have to define the real names of the target files in rule all. Then use the wildcards to get the input files. It would look like this:
rule all:
input:
expand("output/{combination}.output", combination=["A_D","B_D","C_E"])
rule Paste:
input:
F1="{file1}",
F2="{file2}"
output:
out="output/{file1}_{file2}.output"
shell:
"paste {input.F1} {input.F2} > {output.out}"
Please note that using a poor separator like "_" might mess up the wildcards determination if your input files (A,B,C,D,E) also contain "_". I would use something you are sure will not be used in a file name (ie: "__", "_-_", or anything appropriate)
Upvotes: 1