Reputation: 25
I am new to Nextflow scripts. I am trying to build a mitochondrial DNA variant pipeline. I have used fastqc and trimmomatic tool for quality checking and trimming a low quality sequences. I have written a script below, program is executed but shows no output.
#!/usr/bin/env nextflow
params {
fastq_dir = "/mnt/e/Bioinformatics_ppt_learning/mtDNA/nextflow_scripts/*.fastq.gz"
fastqc_dir = "/mnt/e/Bioinformatics_ppt_learning/mtDNA/nextflow_scripts/fastqc_report"
trimmed_dir = "/mnt/e/Bioinformatics_ppt_learning/mtDNA/nextflow_scripts/trimmed_fastq"
trimmomatic_jar = "/mnt/e/Bioinformatics_ppt_learning/mtDNA/nextflow_scripts/trimmomatic-0.39.jar"
}
process FastQC {
tag "Running FastQC on ${fastq}"
publishDir "${fastqc_dir}/${fastq.baseName}"
input: path fastq
script:
"""
fastqc -o ${fastqc_dir} ${fastq}
"""
}
process Trimmomatic {
tag "Trimming ${fastq.baseName}"
input:
path read1 from FastQC.output
output:
file(joinPath(trimmed_dir, "${read1.baseName}_trimmed.fastq.gz"))
script:
"""
java -jar ${params.trimmomatic_jar} PE -threads 4 \
${read1} ${joinPath(trimmed_dir, "${read1.baseName}_trimmed.fastq.gz")} \
${joinPath(trimmed_dir, "${read1.baseName}_unpaired.fastq.gz")} \
${joinPath(trimmed_dir, "${read1.baseName}_unpaired.fastq.gz")}
"""
}
workflow {
fastq_files = Channel.fromPath(params.fastq_dir)
fastq_files.each {
FastQC(fastq: it)
Trimmomatic(read1: FastQC.output)
}
}
Upvotes: 0
Views: 242
Reputation: 2809
There are a few issues with your code. I will address here what I see but most likely is not going to fix everything just get you closer to first working version.
So instead of
workflow {
fastq_files = Channel.fromPath(params.fastq_dir)
fastq_files.each {
FastQC(fastq: it)
Trimmomatic(read1: FastQC.output)
}
}
you owe to write something like this:
workflow {
fastq_files = Channel.fromPath(params.fastq_dir)
FastQC(fastq_files)
Trimmomatic(FastQC.out)
}
Or since your processes have simple outputs that feed into each other you co do this:
workflow {
Channel.fromPath(params.fastq_dir) | FastQC | Trimmomatic
}
However in practice with realistic pipelines things get complicated quickly (several output per process) and you may need to revert to the longer non-piping form above.
As it is already done in the workflow
code above you don't need to link the input of Trimmomatic to the output of FastQ explicitly in the definition of the input read1
; that is old style nextflow and make more difficult to reuse processes across pipelines.
Your processes do not define outputs so it comes to no surprise that nothing gets publish and frankly it should not work at all. So please add the corresponding output:
sections as indicated in Nextflow documentation.
At least in FastqQC you try to specify the output publishing location twice using the publishDir directive (the correct way) and then the output path of the actual output files/directories in each process script itself using absolute paths (wrong). Fix:
4.a Keep the publishDir in FastqQ, and one for Trimmomatic,
4.b Each script file should generate output using relative paths in the process working directory.
4.c change or add an output:
section in each process indicating the name of the output files so that these get published based on the information in the publishDir
directive.
The following are rather minor points and perhaps even not existing problems, just style:
Not sure if .output
would actually work to refer to a process output. In my experience and based on docs .out
should do.
Channel.fromPath
seem to be provided a param fastq_dir
that invites to be interpreted as the directory containing the fastq rather than a list of fastqs. Instead of expecting the user to add the required wildcard characters for expansion as is done in its default value, I would add such wildcards in the code like so:
Channel.fromPath("${params.fastq_dir}/**/*.fastq.gz")
Upvotes: 0
Reputation: 1091
publishDir
works by emitting items in the process output
declaration to the path provided. You haven't provided an output declaration for either process, so it doesn't think there is anything to publish.
Also, unless you're using it for checkpointing, you don't need the output from FastQC
for Trimmomatic
, you can get the two processes to run in parallel.
Don't use joinPath
or any absolute path in your processes. That's not what Nextflow is designed for, and often will lead to errors. Plus, by putting an absolute path in the output declaration, you're telling the process to look in the output directory for the file generated in the process. Use publishDir
to emit files.
The file
operator is deprecated. Use path
instead. The documentation is amazing for nextflow. It's a steep learning curve, but it's very good at describing how things work.
So here is an updated script:
process FastQC {
tag "Running FastQC on ${sampleid}"
publishDir {
path: "${params.fastqc_dir}/${fastq.baseName}",
move: 'move',
}
input:
tuple val(sampleid), path(fastq)
output:
path("*.html")
script:
"""
fastqc ${fastq}
"""
}
process Trimmomatic {
tag "Trimming ${sampleid}"
publishDir {
path: "${params.trimmed_dir}",
move: 'copy',
}
input:
tuple val(sampleid), path(fastq)
output:
path("*_trimmed.fastq.gz")
script:
"""
java -jar ${params.trimmomatic_jar} PE -threads 4 \
${fastq} ${sampleid}_trimmed.fastq.gz")} \
${sampleid}_unpaired.fastq.gz")} \
${sampleid}_unpaired.fastq.gz")}
"""
}
In the workflow, you shouldn't need to tell the processes to iterate over each element. This is the default behaviour of the tool. I've added some commands to the channel generation to highlight some redundancy you can add.
Channel
.fromPath(${params.fastq_dir}/*{.fastq.gz,.fq.gz,.fastq,.fq})
.map { it -> tuple( it.simpleName, it ) }
.ifEmpty { error "Cannot find any fastq files in ${params.fastq_dir}" }
.set { fastq_files }
workflow {
FastQC(fastq_files)
Trimmomatic(fastq_files)
}
EDIT: Missed some of the absolute paths. Updated input to be a tuple instead since it's better at handing names this way and adjusted tags.
Upvotes: 1