AGarofoli
AGarofoli

Reputation: 157

Providing a list of paired files as input for Snakemake

I have done a simple pipeline where I take a couple of files, merge them (well not really, but let's pretend they are merged) into one whose filename is a simple combinations of the two (file1_file2.output) and perform some operations. The pipeline works perfectly if I manually provide the filenames for both file1 and file2, but what I really what to do is something like this:

Let's pretend I have 5 files A, B, C, D and E. I want to run the pipeline for those pairs: A-D, B-D and C-E and this is the Snakefile

rule all:
    input:
        expand("output/{file1}_{file2}.output")

rule Paste:
    input:
        F1="{file1}",
        F2="{file2}"
    output:
        out="output/{file1}_{file2}.output"
    shell:
        "paste {input.F1} {input.F2} > {output.out}"

What is the best way to do so?

Upvotes: 2

Views: 921

Answers (1)

Eric C.
Eric C.

Reputation: 3368

You have to define the real names of the target files in rule all. Then use the wildcards to get the input files. It would look like this:

rule all:
    input:
        expand("output/{combination}.output", combination=["A_D","B_D","C_E"])

rule Paste:
    input:
        F1="{file1}",
        F2="{file2}"
    output:
        out="output/{file1}_{file2}.output"
    shell:
        "paste {input.F1} {input.F2} > {output.out}"

Please note that using a poor separator like "_" might mess up the wildcards determination if your input files (A,B,C,D,E) also contain "_". I would use something you are sure will not be used in a file name (ie: "__", "_-_", or anything appropriate)

Upvotes: 1

Related Questions