Reputation: 23
I'm trying to create a snakemake pipeline whose outputs are determined by the set of sequencing files present in a particular folder. The structure of my filepath here is something like:
project_dir
> Snakefile
> code
> python_scripts
> ab1_to_fastq.py
> data
> 1.ab1_files
> A.ab1
> B.ab1
> C.ab1
> 2.fastq_files
Here's the code for my actual Snakefile
import glob
import os
def collect_reads():
ab1_files = glob.glob("data/1.ab1_files/*.ab1")
ab1_files.sort()
ab1_reads = [ab1_file.split('/')[-1].replace('.ab1', '') for ab1_file in ab1_files]
return ab1_reads
READS = collect_reads()
print(expand("data/2.fastq_files/{read}.fastq", read=READS))
rule convert_ab1_to_fastq:
input:
ab1="data/1.ab1_files/{read}.ab1"
output:
fastq="data/2.fastq_files/{read}.fastq"
shell:
"python code/python_scripts/ab1_to_fastq.py --ab1 {input.ab1} --fastq {output.fastq}"
rule all:
input:
fastq=expand("data/2.fastq_files/{read}.fastq", read=READS)
My understanding is that all
should be my target rule, and that the input variable of fastq in that rule evaluates to
['data/2.fastq_files/A.fastq', 'data/2.fastq_files/B.fastq', 'data/2.fastq_files/C.fastq']
And this seems to be confirmed by the print output in the pipeline when I run my script. However, I get the error WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
whenever I run this script.
Strangely, I can copy one of the paths from the list generated by expand to call snakemake directly, e.g. snakemake data/2.fastq_files/A.fastq
and the pipeline completes successfully.
What am I missing?
Upvotes: 2
Views: 1650
Reputation: 8194
It could be that snakemake thinks your target rule is convert_ab1_to_fastq
and not all
. By default, snakemake takes the first rule as target rule. Declare all
first, and see whether this solves your problem.
Upvotes: 2