Reputation: 241
I'm struggling to integrate my sample sheet (TSV) into my pipeline. Specifically, I want to define the samples wildcard manually instead of reading it from a patch. The reason is that not all samples in a path are supposed to be analysed. Instead, I made a sample sheet that contains the list of samples, the path where to find, reference genome, etc.
The sheet looks like this:
name path reference
sample1 path/to/fastq/files mm9
sample2 path/to/fastq/files mm9
I load the sheet in my snakefile
:
table_samples = pd.read_table(config["samples"], index_col="name")
SAMPLES = table_samples.index.values.tolist()
The first rule is supposed to merge the FASTQ files inside, so it would be nice to do something like this:
rule merge_fastq:
output: "{sample}/{sample}.fastq.gz"
params: path = table_samples['path'][{sample}]
shell: """
cat {params.path}/*.fastq.gz > {output}
"""
But as written above it won't work because the sample wildcard is not defined. Is there a way I can say the sample list I defined above (SAMPLES) contains all the samples for which rules should be executed?
I honestly feel stupid asking this question but I've already spent a couple of hours finding/searching a solution and at this point I need to be a bit more time efficient :D
Thanks!
Upvotes: 0
Views: 1230
Reputation: 3368
You just need a target rule listing all the concrete files you want after your rule "merge_fastq":
rule all:
input: expand("{sample}/{sample}.fastq.gz",sample=SAMPLES)
This rule must be put at the top of the other rules. Wildcards can only be used if you define the concrete files you want at the end of the workflow.
Upvotes: 3