Richard J. Acton
Richard J. Acton

Reputation: 915

renameing .fromFilePairs with regex capture group in closure

I'm new to nextflow/groovy/java and i'm running into some difficulty with a simple regular expression task.

I'm trying to alter the labels of some file pairs. It is my understanding that fromFilePairs returns a data structure of the form:

[
    [common_prefix, [file1, file2]],
    [common_prefix, [file3, file4]]
]

I further thought that:

however, I have tried many variants on the following without success:

params.fastq = "$baseDir/data/fastqs/*_{1,2}_*.fq.gz"

Channel
    .fromFilePairs(params.fastq, checkIfExists:true) {
        file -> 
            // println file.name // returned the common file prefix as I expected
            mt = file.name =~ /(common)_(prefix)/
            // println mt 
            // # java.util.regex.Matcher[pattern=(common)_(prefix) region=0,47 lastmatch=]
            // match objects appear empty despite testing with regexs I know to work correctly including simple stuff like (.*) to rule out issues with my regex
            // println mt.group(0) // #No match found
            mt.group(0) // or a composition like mt.group(0) + "-" + mt.group(1)
    }
    .view()

I've also tried some variant on this using the replaceAll method.

I've consulted documentation for, nextflow, groovy and java and I still can't figure out what I'm missing. I expect it's some stupid syntactic thing or a misunderstanding of the data structure but I'm tired of banging my head against it when it's probably obvious to someone who knows the language better - I'd appreciate anyone who can enlighten me on how this works.

Upvotes: 1

Views: 357

Answers (1)

Steve
Steve

Reputation: 54502

A closure can be provided to the fromfilepairs operator to implement a custom file pair grouping strategy. It takes a file and should return the grouping key. The example in the docs just groups the files by their file extensions:

Channel
    .fromFilePairs('/some/data/*', size: -1) { file -> file.extension }
    .view { ext, files -> "Files with the extension $ext are $files" }

This isn't necessary if all you want to do is alter the labels of some file pairs. You can use the map operator for this. The fromFilePairs op emits tuples in which the first element is the 'grouping key' of the matching pair and the second element is the 'list of files' (sorted lexicographically):

Channel
    .fromFilePairs(params.fastq, checkIfExists:true) \
    .map { group_key, files ->

        tuple( group_key.replaceAll(/common_prefix/, ""), files )
    } \
    .view()

Upvotes: 1

Related Questions