Reputation: 395
I have a nextflow process that outputs multiple files, like below:
[chr1,/path/to/chr1_chunk1.TC.linear]
[chr1,/path/to/chr1_chunk1.HDL.linear]
[chr1,/path/to/chr1_chunk2.TC.linear]
[chr1,/path/to/chr1_chunk2.HDL.linear]
.....
The above example I got after using transpose()
operator.
Now, I want to concatenate All chunks and all chromosome together ordered by chunk and chromosome number so that I get 1 file for TC and another file for HDL. I have multiple traits in many chunks so this link wouldn't be helpful. output files (chromosomal chunks) merging in nextflow Any help?
Upvotes: 2
Views: 1584
Reputation: 54502
If your chunk files are sufficiently small, you can use the collectFile operator to concatenate them into files with names defined using a dynamic grouping criteria:
The grouping criteria is specified by a closure that must return a pair in which the first element defines the file name for the group and the second element the actual value to be appended to that file.
To sort by chromosome number and then by chunk number, you can use the toSortedList and flatMap operators to feed the sorted collection into the collectFile operator:
input_ch
.map { key, chunk_file ->
def matcher = chunk_file.name =~ /^chr(\d+)_chunk(\d+)\.(\w+)\.linear$/
def (_, chrom, chunk, trait) = matcher[0]
tuple( (chrom as int), (chunk as int), trait, chunk_file )
}
.toSortedList( { a, b -> (a[0] <=> b[0]) ?: (a[1] <=> b[1]) } )
.flatMap()
.collectFile( sort: false ) { chrom, chunk, trait, chunk_file ->
[ "${trait}.linear", chunk_file.text ]
}
Upvotes: 2
Reputation: 489
You can use a combination of the branch
and collectFile
operators. Look at the following directory structure below (where the .linear files have their names as contents):
➜ sandbox tree .
.
├── ex1.HDL.linear
├── ex1.TC.linear
├── ex2.HDL.linear
├── ex2.TC.linear
├── ex3.HDL.linear
├── ex3.TC.linear
└── example.nf
I wrote the following minimal reproducible example:
workflow {
files = Channel.fromPath('**.linear', checkIfExists: true)
files
.branch {
TC: it.name.contains('TC')
HDL: it.name.contains('HDL')
}
.set { result }
result
.TC
.collectFile(name: 'TC.txt', storeDir: '/Users/mribeirodantas/sandbox')
result
.HDL
.collectFile(name: 'HDL.txt', storeDir: '/Users/mribeirodantas/sandbox')
}
After running this pipeline with nextflow run example.nf
, I will get in the /Users/mribeirodantas/sandbox
folder two new files: TC.txt
and HDL.txt
. The content of TC.txt
, for example, is:
ex2.TC.linear
ex3.TC.linear
ex1.TC.linear
Upvotes: 4