Reputation: 77
I'm planning to move my bioinformatics pipeline into snakemake as my current pipeline is a collection of multiple scripts that are increasingly hard to follow. On the basis of tutorials and documentation, snakemake seems to be very clear and interesting option for pipeline management. However, I'm not familiar with Python as I am mainly working with bash and R, so snakemake seems to be little harder to learn: I'm facing following problem.
I have two files, sampleA_L001_R1_001.fastq.gz and sampleA_L001_R2_001.fastq.gz, wchich are placed into same directory sampleA. I want to merge this files by using cat
command. This is actually a test run: in real situation I would have eight separate FASTQ files per sample that should be merged in similar manner. Very simple job but something is wrong with my code.
snakemake --latency-wait 20 --snakefile /home/users/me/bin/snakefile.txt
rule mergeFastq:
input:
reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
output:
reads1='sampleA/sampleA_R1.fastq.gz',
reads2='sampleA/sampleA_R2.fastq.gz'
message:
'Merging FASTQ files...'
shell:
'cat {input.reads1} > {output.reads1}'
'cat {input.reads2} > {output.reads2}'
-------------------------------------------------------------
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 mergeFastq
1
Job 0: Merging FASTQ files...
Waiting at most 20 seconds for missing files.
Error in job mergeFastq while creating output files sampleA_R1.fastq.gz, sampleA_R2.fastq.gz.
MissingOutputException in line 5 of /home/users/me/bin/snakefile.txt:
Missing files after 20 seconds:
sampleA_R1.fastq.gz
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Removing output files of failed job mergeFastq since they might be corrupted: sampleA_R2.fastq.gz
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message.
As you can see, I already tried the --latency-wait
option without any success. Do you have any ideas what could be the source of my problem? Paths to files are correct, files itself are non-corrupted and OK. I faced similar problem with wildcards as well, so there must be something that I don't understand in snakemake basics.
Upvotes: 4
Views: 4343
Reputation: 968
The problem is in the shell statement, it is concatenated into one command, which generate a file "sampleA/sampleA_R1.fastq.gzcat", that's why snakemake doesn't find the correct outputs. You can use this syntax for instance:
rule mergeFastq:
input:
reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
output:
reads1='sampleA/sampleA_R1.fastq.gz',
reads2='sampleA/sampleA_R2.fastq.gz'
message:
'Merging FASTQ files...'
shell:"""
cat {input.reads1} > {output.reads1}
cat {input.reads2} > {output.reads2}
"""
The option latency-wait is not needed.
Upvotes: 4