Bram Vanroy
Bram Vanroy

Reputation: 28437

Assign capture groups to variables in XQuery

In many languages it is possible to assign regex capture groups to one or more variables. Is this also the case in XQuery? The best we got so far is doing a 'replace by capture group', but that doesn't seem the prettiest option.

This is what we have now:

let $text := fn:replace($id, '(.+)(\d+)', '$1');
let $snr := fn:replace($id, '(.+)(\d+)', '$2');

which works. But I would have hoped there to be something like this:

let ($text, $snr) := fn:matches($id, '(.+)(\d+)');

Does that (or something similar) exist?

Upvotes: 2

Views: 997

Answers (2)

BeniBela
BeniBela

Reputation: 16917

If you know a certain character does not occur within the capture group, you can use replace with that character between the groups and then tokenize on it in XQuery 1.

For example:

tokenize(replace("abc1234", "(.+)(\d+)", "$1-$2"), "-")

To make sure the replace removes everything before/after the groups:

tokenize(replace("abc1234", "^.*?(.+?)(\d+).*?$", "$1-$2"), "-")

You can generalize that to a function by using string-join to create a replace pattern like "$1-$2-$3-$4" for any separator:

declare function local:get-matches($input, $regex, $separator, $groupcount) {
  tokenize(replace($input, concat("^.*?", $regex, ".*?$"), string-join(for $i in 1 to $groupcount return concat("$", $i), $separator)), $separator, "q" )
};
local:get-matches("abc1234", "(.+?)(\d+)", "|", 2)

If you do not want to specify the separator yourself, you need a function to find one. Every string that is longer than the input string cannot occur in a capture group, so you will can always find one by using a longer separator:

declare function local:get-matches($input, $regex, $separator) {
  if (contains($input, $separator)) then local:get-matches($input, $regex, concat($separator, $separator))
  else 
    let $groupcount := count(string-to-codepoints($regex)[. = 40])
    return tokenize(replace($input, concat("^.*?", $regex, ".*?$"), string-join(for $i in 1 to $groupcount return concat("$", $i), $separator)), $separator, "q" )
};
declare function local:get-matches($input, $regex) {
  local:get-matches($input, $regex, "|#🎄☎")
};
local:get-matches("abc1234", "(.+?)(\d+)")

Upvotes: 0

Jens Erat
Jens Erat

Reputation: 38682

Plain XQuery 1.0 has no support for returning match groups. This shortcoming has been solved in the XQuery function library which provides functx:get-matches, but the implementation is not something to be considered efficient.

XQuery 3.0 knows the very powerful function fn:analyze-string. The function returns both matching and non-matching part, also split by match groups if such are defined in the regular expression.

An example from the Marklogic documentation linked above, but the function is from the standard XPath/XQuery 3.0 function library and also available for other XQuery 3.0 implementations:

fn:analyze-string('Tom Jim John',"((Jim) John)")

=>
<s:analyze-string-result>
  <s:non-match>Tom </s:non-match>
  <s:match>
    <s:group nr="1">
    <s:group nr="2">Jim</s:group>
    John
    </s:group>
  </s:match>
</s:analyze-string-result>

If you do not have support for XQuery 3.0: some engines provide similar implementation-defined functions or allow to use backend functions like Java code, read the documentation for your XQuery engine in this case.

Upvotes: 3

Related Questions