HansVG
HansVG

Reputation: 35

Snakemake Rule to simplify file names using wildcards

Input Data

.
├── barcode01
│   └── fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0.fastq
├── barcode02
│   └── fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0.fastq
├── barcode03
│   └── fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0.fastq
├── barcode04
│   └── fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0.fastq

Snakemake rule

rule symlink_results_demultiplex:
    input:
        inputdirectory+"/basecall/demultiplex/{sample_demultiplex}/{sample_runid}.fastq"
    output:
        outdirectory+"/mothur/{sample_demultiplex}.fastq"
    threads: 1
    shell:
        "ln -s {input} {output}"

However this errors because the same wildcards aren't used. I would like to create a symlink with just the barcode01.fastq as output file. I want to remove the redundant "fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0" part.

What would be the best way to do this?

Upvotes: 1

Views: 108

Answers (1)

elsherbini
elsherbini

Reputation: 1626

One option would be to find the filename for the input in a function that only depends on the {sample_demultiplex} wildcard. This example code might work for this depending on exactly how your folders are set up (right now it assumes that each {sample_demultiplex} wilcard only ever corresponds to a single fastq file.)

import os
import glob

def get_symlink_results_demultiplex_input(wildcards):
    fastq_dir = os.path.join(inputdirectory, "/basecall/demultiplex/", wildcards.sample_demultiplex)
    fastq_file = glob.glob("*.fastq", root_dir=fastq_dir)[0] # this assumes there is only ever one fastq file in a directory
    return os.path.join(fastq_dir, fastq_file)
    

rule symlink_results_demultiplex:
    input:
        get_symlink_results_demultiplex_input
    output:
        outdirectory+"/mothur/{sample_demultiplex}.fastq"
    threads: 1
    shell:
        "ln -s {input} {output}"

Upvotes: 2

Related Questions