ATpoint
ATpoint

Reputation: 878

nextflow .collect() method in RNA-seq example workflow

I understand we have to use collect() when we run a process that takes as input two channels, where the first channel has one element and then second one has > 1 element:


#! /usr/bin/env nextflow

nextflow.enable.dsl=2

process A {

    input:
    val(input1)

    output:
    path 'index.txt', emit: foo

    script:
    """
    echo 'This is an index' > index.txt
    """
}

process B {

    input:
    val(input1)
    path(input2)

    output:
    path("${input1}.txt")

    script:
    """
    cat <(echo ${input1}) ${input2} > \"${input1}.txt\"
    """
}

workflow {

    A( Channel.from( 'A' ) )

    // This would only run for one element of the first channel:
    B( Channel.from( 1, 2, 3 ), A.out.foo )

    // and this for all of them as intended:
    B( Channel.from( 1, 2, 3 ), A.out.foo.collect() )

}

Now the question: Why can this line in the example workflow from nextflow-io (https://github.com/nextflow-io/rnaseq-nf/blob/master/modules/rnaseq.nf#L15) work without using collect() or toList()?

It is the same situation, a channel with one element (the index) and a channel with > 1 (the fastq pairs) shall be used by the same process (quant), and it runs on all fastq files. What am I missing compared to my dummy example?

Upvotes: 3

Views: 1992

Answers (1)

Midnighter
Midnighter

Reputation: 3881

You need to create the first channel with a value factory which never exhausts the channel.

Your linked example implicitly creates a value channel which is why it works. The same happens when you call .collect() on A.out.foo.

Channel.from (or the more modern Channel.of) create a sequence channel which can be exhausted which is why both A and B only run once.

So

A( Channel.value('A') )

is all you need.

Upvotes: 3

Related Questions