Reputation: 915
I'm new to nextflow/groovy/java and i'm running into some difficulty with a simple regular expression task.
I'm trying to alter the labels of some file pairs.
It is my understanding that fromFilePairs
returns a data structure of the form:
[
[common_prefix, [file1, file2]],
[common_prefix, [file3, file4]]
]
I further thought that:
.name
method when invoked on a item from this list will give the name, what I have labelled above as common_prefix
fromFilePairs
sets the names of the file pairs.it
in a closure used with fromFilePairs
is a single item from the list of file pairs.however, I have tried many variants on the following without success:
params.fastq = "$baseDir/data/fastqs/*_{1,2}_*.fq.gz"
Channel
.fromFilePairs(params.fastq, checkIfExists:true) {
file ->
// println file.name // returned the common file prefix as I expected
mt = file.name =~ /(common)_(prefix)/
// println mt
// # java.util.regex.Matcher[pattern=(common)_(prefix) region=0,47 lastmatch=]
// match objects appear empty despite testing with regexs I know to work correctly including simple stuff like (.*) to rule out issues with my regex
// println mt.group(0) // #No match found
mt.group(0) // or a composition like mt.group(0) + "-" + mt.group(1)
}
.view()
I've also tried some variant on this using the replaceAll
method.
I've consulted documentation for, nextflow, groovy and java and I still can't figure out what I'm missing. I expect it's some stupid syntactic thing or a misunderstanding of the data structure but I'm tired of banging my head against it when it's probably obvious to someone who knows the language better - I'd appreciate anyone who can enlighten me on how this works.
Upvotes: 1
Views: 357
Reputation: 54502
A closure can be provided to the fromfilepairs operator to implement a custom file pair grouping strategy. It takes a file and should return the grouping key. The example in the docs just groups the files by their file extensions:
Channel
.fromFilePairs('/some/data/*', size: -1) { file -> file.extension }
.view { ext, files -> "Files with the extension $ext are $files" }
This isn't necessary if all you want to do is alter the labels of some file pairs. You can use the map operator for this. The fromFilePairs op emits tuples in which the first element is the 'grouping key' of the matching pair and the second element is the 'list of files' (sorted lexicographically):
Channel
.fromFilePairs(params.fastq, checkIfExists:true) \
.map { group_key, files ->
tuple( group_key.replaceAll(/common_prefix/, ""), files )
} \
.view()
Upvotes: 1