Jokhe
Jokhe

Reputation: 77

MissinOutputException in snakemake

I'm planning to move my bioinformatics pipeline into snakemake as my current pipeline is a collection of multiple scripts that are increasingly hard to follow. On the basis of tutorials and documentation, snakemake seems to be very clear and interesting option for pipeline management. However, I'm not familiar with Python as I am mainly working with bash and R, so snakemake seems to be little harder to learn: I'm facing following problem.

I have two files, sampleA_L001_R1_001.fastq.gz and sampleA_L001_R2_001.fastq.gz, wchich are placed into same directory sampleA. I want to merge this files by using cat command. This is actually a test run: in real situation I would have eight separate FASTQ files per sample that should be merged in similar manner. Very simple job but something is wrong with my code.

snakemake --latency-wait 20 --snakefile /home/users/me/bin/snakefile.txt

rule mergeFastq:
    input:
        reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
        reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
    output:
        reads1='sampleA/sampleA_R1.fastq.gz',
        reads2='sampleA/sampleA_R2.fastq.gz'
    message:
        'Merging FASTQ files...'
    shell:
        'cat {input.reads1} > {output.reads1}'
        'cat {input.reads2} > {output.reads2}'

-------------------------------------------------------------

Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   mergeFastq
    1

Job 0: Merging FASTQ files...

Waiting at most 20 seconds for missing files.
Error in job mergeFastq while creating output files sampleA_R1.fastq.gz, sampleA_R2.fastq.gz.
MissingOutputException in line 5 of /home/users/me/bin/snakefile.txt:
Missing files after 20 seconds:
sampleA_R1.fastq.gz
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Removing output files of failed job mergeFastq since they might be corrupted: sampleA_R2.fastq.gz
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message.

As you can see, I already tried the --latency-wait option without any success. Do you have any ideas what could be the source of my problem? Paths to files are correct, files itself are non-corrupted and OK. I faced similar problem with wildcards as well, so there must be something that I don't understand in snakemake basics.

Upvotes: 4

Views: 4343

Answers (1)

rioualen
rioualen

Reputation: 968

The problem is in the shell statement, it is concatenated into one command, which generate a file "sampleA/sampleA_R1.fastq.gzcat", that's why snakemake doesn't find the correct outputs. You can use this syntax for instance:

rule mergeFastq:
    input:
        reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
        reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
    output:
        reads1='sampleA/sampleA_R1.fastq.gz',
        reads2='sampleA/sampleA_R2.fastq.gz'
    message:
        'Merging FASTQ files...'
    shell:"""
        cat {input.reads1} > {output.reads1}
        cat {input.reads2} > {output.reads2}
    """

The option latency-wait is not needed.

Upvotes: 4

Related Questions