reed woyda
reed woyda

Reputation: 23

Combining output from multiple Nextflow processes into another process. Nextflow DSL2. FastQC and MultiQC

Goal: Take input and process it through:

  1. initial FastQC
  2. trimming with Sickle
  3. post-trimming FastQC
  4. process both initial and post-trimming FastQC outputs with MultiQC

Input: sample 1: sample1_R1_.fastq sample1_R2_.fastq sample 2: sample2_R1_.fastq sample2_R2_.fastq

Processes: each process contains a:

publishDir "${params.outdir}/<name>", mode: "copy"

Where the is either, "sickle", "fastqc", or "multiqc".

Output I am getting now:

I have the following code:

workflow {

    
    SICKLE( reads )
    fastqc_ch = FASTQC(reads, threads)
    
    sickle_fastqc_ch = SICKLE_FASTQC ( SICKLE.out.reads_trimmed , threads )
   
    fastqc_output = fastqc_ch.collect()
    sickle_fastqc_output = sickle_fastqc_ch.collect()
   
    combined_output = fastqc_output.merge(sickle_fastqc_output)
 
    MULTIQC( combined_output )
}

Need help

Upvotes: 2

Views: 630

Answers (1)

Steve
Steve

Reputation: 54502

I think the trick is to combine the FastQC and Sickle log files prior to calling collect. You can use the mix operator for this, for example using Conda:

Contents of main.nf:

params.reads = '/path/to/fastqs/*_R{1,2}.fastq.gz'
params.multiqc_config = './assets/multiqc_config.yaml'

include { FASTQC as FASTQC_RAW } from './modules/fastqc'
include { FASTQC as FASTQC_TRIMMED } from './modules/fastqc'
include { SICKLE_PE } from './modules/sickle'
include { MULTIQC } from './modules/multiqc'


workflow {

    reads = Channel.fromFilePairs( params.reads )

    multiqc_config = file( params.multiqc_config )

    FASTQC_RAW( reads )
    SICKLE_PE( reads )
    FASTQC_TRIMMED( SICKLE_PE.out.trimmed )

    Channel.empty()
        .mix( FASTQC_RAW.out )
        .mix( SICKLE_PE.out.log )
        .mix( FASTQC_TRIMMED.out )
        .map { sample, files -> files }
        .collect()
        .set { log_files }

    MULTIQC( log_files, multiqc_config )
}

Contents of ./modules/fastqc/main.nf:

process FASTQC {

    tag { sample }

    input:
    tuple val(sample), path(reads)

    output:
    tuple val(sample), path("*_fastqc.{zip,html}")

    """
    fastqc -q ${reads}
    """
}

Contents of ./modules/sickle/main.nf:

process SICKLE_PE {

    tag { sample }

    input:
    tuple val(sample), path(reads, stageAs: 'reads/*')

    output:
    tuple val(sample), path("*.trimmed.fastq.gz"), emit: trimmed
    tuple val(sample), path("${sample}.singles.fastq.gz"), emit: singles
    tuple val(sample), path("${sample}.log"), emit: log

    script:
    def (fq1, fq2) = reads

    """
    sickle pe \\
        -t sanger \\
        -g \\
        -f "${fq1}" \\
        -r "${fq2}" \\
        -o "${sample}_R1.trimmed.fastq.gz" \\
        -p "${sample}_R2.trimmed.fastq.gz" \\
        -s "${sample}.singles.fastq.gz" \\
        1> "${sample}.log"
    """
}

Contents of ./modules/multiqc/main.nf:

process MULTIQC {

    input:
    path 'logs/*'
    path config

    output:
    path "multiqc_report.html", emit: html
    path "multiqc_data", emit: data

    """
    multiqc \\
        --config "${config}" \\
        .
    """
}

Contents of ./nextflow.config:

params {

    outdir = './results'
}

process {

    withName: FASTQC {

        publishDir = [
            path: "${params.outdir}/fastqc",
            mode: 'copy',
        ]

        cpus = 1
        conda = 'fastqc=0.12.1'
    }

    withName: SICKLE_PE {

        publishDir = [
            path: "${params.outdir}/sickle",
            mode: 'copy',
        ]

        cpus = 1
        conda = 'sickle-trim=1.33'
    }

    withName: MULTIQC {

        publishDir = [
            path: "${params.outdir}/multiqc",
            mode: 'copy',
        ]

        cpus = 1
        conda = 'multiqc=1.14'
    }
}

conda {

    enabled = true
}

Contents of ./assets/multiqc_config.yaml:

module_order:
    - fastqc:
        name: 'FastQC (raw)'
        anchor: 'fastqc-raw'
        target: 'FastQC'
        path_filters_exclude:
            - './logs/*.trimmed_fastqc.zip'
    - sickle
    - fastqc:
        name: 'FastQC (trimmed)'
        anchor: 'fastqc-trimmed'
        target: 'FastQC'
        path_filters:
            - './logs/*.trimmed_fastqc.zip'

run_modules:
    - fastqc
    - sickle

plots_force_interactive: True

show_analysis_time: False
show_analysis_paths: False

Results:

$ nextflow run main.nf -ansi-log false
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [distraught_euler] DSL2 - revision: 971e2c9d1f
Creating env using conda: fastqc=0.12.1 [cache /path/to/work/conda/env-d3b12ea84164cc521e82b56dc7f119d9]
Creating env using conda: sickle-trim=1.33 [cache /path/to/work/conda/env-72d5fea3bee2c2c7bb1951c0356c97fa]
[d2/302df1] Submitted process > SICKLE_PE (sample2)
[11/13a1f3] Submitted process > SICKLE_PE (sample1)
[ce/f8d7b9] Submitted process > SICKLE_PE (sample3)
[6a/0588fc] Submitted process > FASTQC_RAW (sample3)
[3a/deabf3] Submitted process > FASTQC_RAW (sample1)
[95/e2ddb3] Submitted process > FASTQC_RAW (sample2)
[dd/39b166] Submitted process > FASTQC_TRIMMED (sample2)
[45/bdefdc] Submitted process > FASTQC_TRIMMED (sample3)
[21/c15ebb] Submitted process > FASTQC_TRIMMED (sample1)
Creating env using conda: multiqc=1.14 [cache /path/to/work/conda/env-39798d385be8fa0f1dce9354302302f0]
[4b/45310d] Submitted process > MULTIQC

Upvotes: 1

Related Questions