Reputation: 201
How can I run a process with one instance for each pair of values in two lists, then collect the output of those instances along only one of the lists at a time?
For example, if you run this Nextflow script:
numbers = Channel
.from(1..2)
.into{numbers1; numbers2}
letters = Channel
.from('A'..'B')
process p1 {
input:
each number from numbers1
each letter from letters
output:
path "${number}${letter}.txt" into foo
"""
echo "$number $letter" > ${number}${letter}.txt
"""
}
process p2 {
input:
path numberletters from foo.collect()
each number from numbers2
"""
for file in $numberletters; do
cat \$file >> $baseDir/${number}.out
done
"""
}
you get two output files (as expected): 1.out
and 2.out
. Each of these contains the same set of lines:
1 A
1 B
2 A
2 B
How can I make it so that 1.out
contains only 1 A
and 1 B
, and 2.out
contains only 2 A
and 2 B
? i.e., .collect()
ing the foo
channel only collects the p1
outputs according to their letter
input and keeps separate instances with different number
inputs?
Upvotes: 3
Views: 432
Reputation: 54502
One solution is to have your first process output a tuple which includes the 'number' as the first element, and then call groupTuple() to group together the files that share the same key:
numbers = Channel.of(1..2)
letters = Channel.of('A'..'B')
process p1 {
input:
tuple val(number), val(letter) from numbers.combine(letters)
output:
tuple val(number), path("${number}${letter}.txt") into foo
"""
echo "${number} ${letter}" > "${number}${letter}.txt"
"""
}
process p2 {
publishDir baseDir, mode: 'copy'
input:
tuple val(number), path(numberletters) from foo.groupTuple()
output:
path "${number}.out"
"""
cat $numberletters > "${number}.out"
"""
}
If you know how many elements to expect in each group, you can set the 'size' attribute to allow the groupTuple operator to stream the collected values as soon as possible.
Upvotes: 2