Reputation: 35
Input Data
.
├── barcode01
│ └── fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0.fastq
├── barcode02
│ └── fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0.fastq
├── barcode03
│ └── fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0.fastq
├── barcode04
│ └── fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0.fastq
Snakemake rule
rule symlink_results_demultiplex:
input:
inputdirectory+"/basecall/demultiplex/{sample_demultiplex}/{sample_runid}.fastq"
output:
outdirectory+"/mothur/{sample_demultiplex}.fastq"
threads: 1
shell:
"ln -s {input} {output}"
However this errors because the same wildcards aren't used. I would like to create a symlink with just the barcode01.fastq as output file. I want to remove the redundant "fastq_runid_6292747b0109c4fa5918c50eb8204bb715f19ad0_0" part.
What would be the best way to do this?
Upvotes: 1
Views: 108
Reputation: 1626
One option would be to find the filename for the input in a function that only depends on the {sample_demultiplex}
wildcard. This example code might work for this depending on exactly how your folders are set up (right now it assumes that each {sample_demultiplex}
wilcard only ever corresponds to a single fastq file.)
import os
import glob
def get_symlink_results_demultiplex_input(wildcards):
fastq_dir = os.path.join(inputdirectory, "/basecall/demultiplex/", wildcards.sample_demultiplex)
fastq_file = glob.glob("*.fastq", root_dir=fastq_dir)[0] # this assumes there is only ever one fastq file in a directory
return os.path.join(fastq_dir, fastq_file)
rule symlink_results_demultiplex:
input:
get_symlink_results_demultiplex_input
output:
outdirectory+"/mothur/{sample_demultiplex}.fastq"
threads: 1
shell:
"ln -s {input} {output}"
Upvotes: 2