Reputation: 878
I understand we have to use collect()
when we run a process that takes as input two channels, where the first channel has one element and then second one has > 1 element:
#! /usr/bin/env nextflow
nextflow.enable.dsl=2
process A {
input:
val(input1)
output:
path 'index.txt', emit: foo
script:
"""
echo 'This is an index' > index.txt
"""
}
process B {
input:
val(input1)
path(input2)
output:
path("${input1}.txt")
script:
"""
cat <(echo ${input1}) ${input2} > \"${input1}.txt\"
"""
}
workflow {
A( Channel.from( 'A' ) )
// This would only run for one element of the first channel:
B( Channel.from( 1, 2, 3 ), A.out.foo )
// and this for all of them as intended:
B( Channel.from( 1, 2, 3 ), A.out.foo.collect() )
}
Now the question: Why can this line in the example workflow from nextflow-io (https://github.com/nextflow-io/rnaseq-nf/blob/master/modules/rnaseq.nf#L15) work without using collect()
or toList()
?
It is the same situation, a channel with one element (the index) and a channel with > 1 (the fastq pairs) shall be used by the same process (quant), and it runs on all fastq files. What am I missing compared to my dummy example?
Upvotes: 3
Views: 1992
Reputation: 3881
You need to create the first channel with a value factory which never exhausts the channel.
Your linked example implicitly creates a value channel which is why it works. The same happens when you call .collect()
on A.out.foo
.
Channel.from
(or the more modern Channel.of
) create a sequence channel which can be exhausted which is why both A
and B
only run once.
So
A( Channel.value('A') )
is all you need.
Upvotes: 3