Danica Scott
Danica Scott

Reputation: 201

How to collect channels hierarchically?

How can I run a process with one instance for each pair of values in two lists, then collect the output of those instances along only one of the lists at a time?

For example, if you run this Nextflow script:

numbers = Channel
    .from(1..2)
    .into{numbers1; numbers2}

letters = Channel
    .from('A'..'B')

process p1 {
    input:
    each number from numbers1
    each letter from letters

    output:
    path "${number}${letter}.txt" into foo

    """
    echo "$number $letter" > ${number}${letter}.txt
    """
}

process p2 {
    input:
    path numberletters from foo.collect()
    each number from numbers2

    """
    for file in $numberletters; do
        cat \$file >> $baseDir/${number}.out
    done
    """
}

you get two output files (as expected): 1.out and 2.out. Each of these contains the same set of lines:

1 A
1 B
2 A
2 B

How can I make it so that 1.out contains only 1 A and 1 B, and 2.out contains only 2 A and 2 B? i.e., .collect()ing the foo channel only collects the p1 outputs according to their letter input and keeps separate instances with different number inputs?

Upvotes: 3

Views: 432

Answers (1)

Steve
Steve

Reputation: 54502

One solution is to have your first process output a tuple which includes the 'number' as the first element, and then call groupTuple() to group together the files that share the same key:

numbers = Channel.of(1..2)
letters = Channel.of('A'..'B')


process p1 {

    input:
    tuple val(number), val(letter) from numbers.combine(letters)

    output:
    tuple val(number), path("${number}${letter}.txt") into foo

    """
    echo "${number} ${letter}" > "${number}${letter}.txt"
    """
}

process p2 {

    publishDir baseDir, mode: 'copy'

    input:
    tuple val(number), path(numberletters) from foo.groupTuple()

    output:
    path "${number}.out"

    """
    cat $numberletters > "${number}.out"
    """
}

If you know how many elements to expect in each group, you can set the 'size' attribute to allow the groupTuple operator to stream the collected values as soon as possible.

Upvotes: 2

Related Questions