Reputation: 28437
In many languages it is possible to assign regex capture groups to one or more variables. Is this also the case in XQuery? The best we got so far is doing a 'replace by capture group', but that doesn't seem the prettiest option.
This is what we have now:
let $text := fn:replace($id, '(.+)(\d+)', '$1');
let $snr := fn:replace($id, '(.+)(\d+)', '$2');
which works. But I would have hoped there to be something like this:
let ($text, $snr) := fn:matches($id, '(.+)(\d+)');
Does that (or something similar) exist?
Upvotes: 2
Views: 997
Reputation: 16917
If you know a certain character does not occur within the capture group, you can use replace with that character between the groups and then tokenize on it in XQuery 1.
For example:
tokenize(replace("abc1234", "(.+)(\d+)", "$1-$2"), "-")
To make sure the replace removes everything before/after the groups:
tokenize(replace("abc1234", "^.*?(.+?)(\d+).*?$", "$1-$2"), "-")
You can generalize that to a function by using string-join to create a replace pattern like "$1-$2-$3-$4" for any separator:
declare function local:get-matches($input, $regex, $separator, $groupcount) {
tokenize(replace($input, concat("^.*?", $regex, ".*?$"), string-join(for $i in 1 to $groupcount return concat("$", $i), $separator)), $separator, "q" )
};
local:get-matches("abc1234", "(.+?)(\d+)", "|", 2)
If you do not want to specify the separator yourself, you need a function to find one. Every string that is longer than the input string cannot occur in a capture group, so you will can always find one by using a longer separator:
declare function local:get-matches($input, $regex, $separator) {
if (contains($input, $separator)) then local:get-matches($input, $regex, concat($separator, $separator))
else
let $groupcount := count(string-to-codepoints($regex)[. = 40])
return tokenize(replace($input, concat("^.*?", $regex, ".*?$"), string-join(for $i in 1 to $groupcount return concat("$", $i), $separator)), $separator, "q" )
};
declare function local:get-matches($input, $regex) {
local:get-matches($input, $regex, "|#🎄☎")
};
local:get-matches("abc1234", "(.+?)(\d+)")
Upvotes: 0
Reputation: 38682
Plain XQuery 1.0 has no support for returning match groups. This shortcoming has been solved in the XQuery function library which provides functx:get-matches
, but the implementation is not something to be considered efficient.
XQuery 3.0 knows the very powerful function fn:analyze-string
. The function returns both matching and non-matching part, also split by match groups if such are defined in the regular expression.
An example from the Marklogic documentation linked above, but the function is from the standard XPath/XQuery 3.0 function library and also available for other XQuery 3.0 implementations:
fn:analyze-string('Tom Jim John',"((Jim) John)")
=>
<s:analyze-string-result>
<s:non-match>Tom </s:non-match>
<s:match>
<s:group nr="1">
<s:group nr="2">Jim</s:group>
John
</s:group>
</s:match>
</s:analyze-string-result>
If you do not have support for XQuery 3.0: some engines provide similar implementation-defined functions or allow to use backend functions like Java code, read the documentation for your XQuery engine in this case.
Upvotes: 3