Reputation: 824
I am trying to find a solution to the following problem in xquery 1.0. I need to use just recursive functions alone. The input will always have characters between A-Z only.
1. AAABBCCD --> A1A2A3-B1B2-C1C2-D1
2. ABBAB --> A1-B1B2-A2-B3
Thanks. What I have tried so far does not lead anywhere.
Upvotes: 2
Views: 240
Reputation: 243479
I am trying to see how to achieve the following in xquery 1.0. I am looking for a solution using the recursive functions alone. The input will always have characters between A-Z only.
Here is a single XPath 2.0 expression:
string-join(
(for $s in .,
$indS in 1 to string-length($s),
$cp in string-to-codepoints($s)[$indS]
return
(
'-'[$indS gt 1 and $cp ne string-to-codepoints($s)[$indS -1]]
,
concat(codepoints-to-string($cp),
for $i in 1 to count(index-of(string-to-codepoints($s), $cp))
return
$i[index-of(string-to-codepoints($s), $cp)[$i] eq $indS]
)
)
),
''
)
This, of course, is non-recursive, which I think is an advantage -- we are guaranteed that no call-stack overflow will happen :)
->>>>> Also note, that the input characters may have any value -- not only [A-Z]. <<<<<-
This solution works even if the input string has many hundreds of distinct / different characters and is many thousands of characters long. Compare this to a naïve recursive algorithm that, on every new character read from the input string allocates a new sequence of hundreds of numbers...
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="t">
<xsl:sequence select=
"string-join(
(for $s in .,
$indS in 1 to string-length($s),
$cp in string-to-codepoints($s)[$indS]
return
(
'-'[$indS gt 1 and $cp ne string-to-codepoints($s)[$indS -1]]
,
concat(codepoints-to-string($cp),
for $i in 1 to count(index-of(string-to-codepoints($s), $cp))
return
$i[index-of(string-to-codepoints($s), $cp)[$i] eq $indS]
)
)
),
''
)
"/>
</xsl:template>
</xsl:stylesheet>
The above transformation simply evaluates the XPath expression against every <t>
element of the source XML document.
When applied on this XML document:
<z>
<t>AAABBCCD</t>
<t>ABBAB</t>
<t>aBcBaBcc</t>
</z>
the wanted, correct result is produced:
A1A2A3-B1B2-C1C2-D1
A1-B1B2-A2-B3
a1-B1-c1-B2-a2-B3-c2c3
Upvotes: 2
Reputation: 4241
First you probably want to split the string up into individual characters, this can be done in multiple ways (e.g. using fn:substring(...)
), I will use fn:string-to-codepoints($str)
:
declare function local:to-chars($str) {
for $cp in fn:string-to-codepoints($str)
return fn:codepoints-to-string($cp)
};
Example: local:to-chars('ABC')
yields the sequence ('A', 'B', 'C')
.
For your formatting scheme, you have to keep track of
I will track the counts per character in an XQuery 3.0 map and keep the previous character in a separate argument of the recursive function. All there is to do now is to
We are finished when all characters have been processed.
declare function local:format($chars, $counts, $last, $out) {
if(empty($chars)) then $out
else (
let $c := head($chars), (: current char :)
$new-chars := tail($chars), (: rest of the chars :)
$count := ($counts($c), 1)[1], (: old count if present, `1` otherwise :)
$cc := $c || $count, (: char with count :)
$new-out := if($c eq $last) then $out || $cc (: updated output string :)
else $out || '-' || $cc,
$new-counts := map:put($counts, $c, $count + 1) (: updated map :)
return local:format($new-chars, $new-counts, $c, $new-out)
)
};
We call it with an empty map and the first character in the sequence as "previous" character to avoid a dash at the start of the output.
declare function local:format-string($str) {
let $chars := local:to-chars($str)
return local:format($chars, map{}, head($chars), '')
};
This works with arbitrary characters: local:format-string('HELLO')
yields 'H1-E1-L1L2-O1'
.
EDIT:
I must have skipped over the XQuery 1.0 part, I blame lack of caffeine... You can also keep the counts as a sequence of 26 integers instead of a map:
declare function local:format($chars, $counts, $last, $out) {
if(empty($chars)) then $out
else (
let $c := $chars[1],
$new-chars := subsequence($chars, 2),
$pos := string-to-codepoints($c) - 64,
$count := $counts[$pos],
$cc := $c || $count,
$new-out := if($c eq $last) then concat($out, $cc)
else concat($out, '-', $cc),
$new-counts :=
(
subsequence($counts, 1, $pos - 1),
$count + 1,
subsequence($counts, $pos + 1)
)
return local:format($new-chars, $new-counts, $c, $new-out)
)
};
declare function local:format-string($str) {
let $chars := local:to-chars($str),
$counts := for $i in 1 to 26 return 1
return local:format($chars, $counts, head($chars), '')
};
Upvotes: 4