Reputation: 11
I am using snakemake version 7.30.1
I am trying to run my snakemake workflow using snakemake --cores 4. Snakemake seems to be able to locate the input files and seems to start to complete the steps of the first rule in the workflow, but then for some reason exits out with a missingoutputexcpetion error stating it cannot find the output files for the second of the two samples in the samples list. This doesn't seem to be an issue with the files themself since when I switch the order of the files, the new first sample runs and the new second sample doesn't. I have tried changing the latency as well but it didn't help.
I am trying to run fastp in my first rule for two samples and two reads. The output should produce the files M31A_150k_1_final.fq, M28B_150k_1_final.fq, M31A_150k_2_final.fq, M28B_150k_2_final.fq:
base_path = "/Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/"
Define list of sample names
samples = ["M31A_150k" , "M28B_150k"]
rule all:
input:
expand(base_path + "bai/{sample}_all.bam.bai", sample=samples),
expand(base_path + "bai/{sample}_forward.bam.bai", sample=samples),
expand(base_path + "bai/{sample}_reverse.bam.bai", sample=samples),
expand(base_path + "bigwig/{sample}.bw", sample=samples),
expand(base_path + "bigwig/{sample}_forward.bw", sample=samples),
expand(base_path + "bigwig/{sample}_reverse.bw", sample=samples)
rule fastp_adaptors:
input:
R1 = expand(base_path + "testfiles/{sample}_1.fq", sample=samples),
R2 = expand(base_path + "testfiles/{sample}_2.fq", sample=samples)
output:
R1_final = expand(base_path + "trimmed/{sample}_1_final.fq", sample=samples),
R2_final = expand(base_path + "trimmed/{sample}_2_final.fq", sample=samples)
shell:
"""
fastp -w 8 --dont_eval_duplication -i {input.R1} -I {input.R2} -t 10 -F 10 -o {output.R1_final} -O {output.R2_final} --detect_adapter_for_pe
"""
Here is the log of the error I am receiving:
valeriaaizen@Valerias-MacBook-Pro \~/D/c/n/snakemake-attempt (main)\> snakemake --cores 4 (myenv_x86)
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
all 1 1 1
bowtie2 1 1 1
deeptools_bigwigall 1 1 1
deeptools_bigwigforward 1 1 1
deeptools_bigwigreverse 1 1 1
fastp_adaptors 1 1 1
merge_83163 1 1 1
merge_99147 1 1 1
reverse 1 1 1
samtools_indexall 1 1 1
samtools_indexforward 1 1 1
samtools_sort 1 4 4
samtools_sort147 1 1 1
samtools_sort163 1 1 1
samtools_sort83 1 1 1
samtools_sort99 1 1 1
total 16 1 4
Select jobs to execute...
\[Thu Sep 7 14:39:53 2023\]
rule fastp_adaptors:
input: /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M31A_150k_1.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M28B_150k_1.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M31A_150k_2.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M28B_150k_2.fq
output: /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_2_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_2_final.fq
jobid: 4
reason: Missing output files: /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_2_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_2_final.fq
resources: tmpdir=/var/folders/4c/h8ky28xj143dkssjycttn5lr0000gn/T
Detecting adapter sequence for read1...
Illumina TruSeq Adapter Read 1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Detecting adapter sequence for read2...
No adapter detected for read2
Read1 before filtering:
total reads: 150000
total bases: 22500000
Q20 bases: 21987079(97.7204%)
Q30 bases: 21372363(94.9883%)
Read2 before filtering:
total reads: 150000
total bases: 22500000
Q20 bases: 21768444(96.7486%)
Q30 bases: 21103172(93.7919%)
Read1 after filtering:
total reads: 136856
total bases: 18856683
Q20 bases: 18594358(98.6088%)
Q30 bases: 18347138(97.2978%)
Read2 after filtering:
total reads: 136856
total bases: 17587532
Q20 bases: 17259790(98.1365%)
Q30 bases: 16852551(95.821%)
Filtering result:
reads passed filter: 273712
reads failed due to low quality: 2162
reads failed due to too many N: 18
reads failed due to too short: 24108
reads with adapter trimmed: 35295
bases trimmed due to adapters: 2204956
Insert size peak (evaluated by paired-end reads): 150
JSON report: fastp.json
HTML report: fastp.html
fastp -w 8 --dont_eval_duplication -i /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M31A_150k_1.fq /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M28B_150k_1.fq -I /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M31A_150k_2.fq /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/testfiles/M28B_150k_2.fq -t 10 -F 10 -o /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_1_final.fq /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_1_final.fq -O /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_2_final.fq /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_2_final.fq --detect_adapter_for_pe
fastp v0.22.0, time used: 8 seconds
Waiting at most 5 seconds for missing files.
MissingOutputException in rule fastp_adaptors in file /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/Snakefile, line 35:
Job 4 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
/Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_1_final.fq
/Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M28B_150k_2_final.fq
Removing output files of failed job fastp_adaptors since they might be corrupted:
/Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_1_final.fq, /Users/valeriaaizen/Documents/code/notebooks/snakemake-attempt/trimmed/M31A_150k_2_final.fq
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-09-07T143950.741220.snakemake.log
Upvotes: 1
Views: 163
Reputation: 9062
rule fastp_adaptors:
input:
R1 = expand(base_path + "testfiles/{sample}_1.fq", sample=samples),
R2 = expand(base_path + "testfiles/{sample}_2.fq", sample=samples)
output:
R1_final = expand(base_path + "trimmed/{sample}_1_final.fq", sample=samples),
R2_final = expand(base_path + "trimmed/{sample}_2_final.fq", sample=samples)
shell:
"""
fastp -w 8 --dont_eval_duplication -i {input.R1} -I {input.R2} -t 10
-F 10 -o {output.R1_final} -O {output.R2_final} --detect_adapter_for_pe
"""
I guess fastp_adaptors
has to run once on each pair of fastq files (for a total of two runs in your case). However, since you have expand
in your input and output directives fastp_adaptors
runs just once on all pairs together causing the error. So try removing the expand
s in fastp_adaptors
. (If you are new to snakemake, this is one of the things that trips beginners)
Upvotes: 0