Slkrasnodar
Slkrasnodar

Reputation: 824

xquery recursion for counting/formatting characters in a string

I am trying to find a solution to the following problem in xquery 1.0. I need to use just recursive functions alone. The input will always have characters between A-Z only.

1. AAABBCCD --> A1A2A3-B1B2-C1C2-D1
2. ABBAB     --> A1-B1B2-A2-B3

Thanks. What I have tried so far does not lead anywhere.

Upvotes: 2

Views: 240

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243479

I am trying to see how to achieve the following in xquery 1.0. I am looking for a solution using the recursive functions alone. The input will always have characters between A-Z only.

Here is a single XPath 2.0 expression:

 string-join(
             (for $s in .,
               $indS in 1 to string-length($s),
               $cp in string-to-codepoints($s)[$indS]
                return
                  (
                  '-'[$indS gt 1 and $cp ne string-to-codepoints($s)[$indS -1]]
                    ,
                    concat(codepoints-to-string($cp),
                            for $i in 1 to count(index-of(string-to-codepoints($s), $cp))
                             return
                               $i[index-of(string-to-codepoints($s), $cp)[$i] eq $indS]
                           )
                    )
                ),
              ''
             )

This, of course, is non-recursive, which I think is an advantage -- we are guaranteed that no call-stack overflow will happen :)

->>>>> Also note, that the input characters may have any value -- not only [A-Z]. <<<<<-

This solution works even if the input string has many hundreds of distinct / different characters and is many thousands of characters long. Compare this to a naïve recursive algorithm that, on every new character read from the input string allocates a new sequence of hundreds of numbers...


XSLT 2.0 - based verification:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:template match="t">
    <xsl:sequence select=
     "string-join(
                 (for $s in .,
                   $indS in 1 to string-length($s),
                   $cp in string-to-codepoints($s)[$indS]
                    return
                      (
                      '-'[$indS gt 1 and $cp ne string-to-codepoints($s)[$indS -1]]
                        ,
                        concat(codepoints-to-string($cp),
                                for $i in 1 to count(index-of(string-to-codepoints($s), $cp))
                                 return
                                   $i[index-of(string-to-codepoints($s), $cp)[$i] eq $indS]
                               )
                        )
                    ),
                  ''
                 )
       "/>
  </xsl:template>
</xsl:stylesheet>

The above transformation simply evaluates the XPath expression against every <t> element of the source XML document.

When applied on this XML document:

<z>
    <t>AAABBCCD</t>
    <t>ABBAB</t>
    <t>aBcBaBcc</t>
</z>

the wanted, correct result is produced:

A1A2A3-B1B2-C1C2-D1
A1-B1B2-A2-B3
a1-B1-c1-B2-a2-B3-c2c3

Upvotes: 2

Leo W&#246;rteler
Leo W&#246;rteler

Reputation: 4241

First you probably want to split the string up into individual characters, this can be done in multiple ways (e.g. using fn:substring(...)), I will use fn:string-to-codepoints($str):

declare function local:to-chars($str) {
  for $cp in fn:string-to-codepoints($str)
  return fn:codepoints-to-string($cp)
};

Example: local:to-chars('ABC') yields the sequence ('A', 'B', 'C').

For your formatting scheme, you have to keep track of

  1. how often you encountered each character until that point for the numbering, and
  2. what the previous character was (for the dashes).

I will track the counts per character in an XQuery 3.0 map and keep the previous character in a separate argument of the recursive function. All there is to do now is to

  • go through each character one-by-one,
  • get and update its count in the map,
  • decide if a dash should be added before it in the output string,
  • and recurse.

We are finished when all characters have been processed.

declare function local:format($chars, $counts, $last, $out) {
  if(empty($chars)) then $out
  else (
    let $c          := head($chars),              (: current char :)
        $new-chars  := tail($chars),              (: rest of the chars :)
        $count      := ($counts($c), 1)[1],       (: old count if present, `1` otherwise :)
        $cc         := $c || $count,              (: char with count :)
        $new-out    := if($c eq $last) then $out || $cc  (: updated output string :)
                       else $out || '-' || $cc,
        $new-counts := map:put($counts, $c, $count + 1)  (: updated map :)
    return local:format($new-chars, $new-counts, $c, $new-out)
  )
};

We call it with an empty map and the first character in the sequence as "previous" character to avoid a dash at the start of the output.

declare function local:format-string($str) {
  let $chars := local:to-chars($str)
  return local:format($chars, map{}, head($chars), '')
};

This works with arbitrary characters: local:format-string('HELLO') yields 'H1-E1-L1L2-O1'.

EDIT:

I must have skipped over the XQuery 1.0 part, I blame lack of caffeine... You can also keep the counts as a sequence of 26 integers instead of a map:

declare function local:format($chars, $counts, $last, $out) {
  if(empty($chars)) then $out
  else (
    let $c          := $chars[1],
        $new-chars  := subsequence($chars, 2),
        $pos        := string-to-codepoints($c) - 64,
        $count      := $counts[$pos],
        $cc         := $c || $count,
        $new-out    := if($c eq $last) then concat($out, $cc)
                       else concat($out, '-', $cc),
        $new-counts :=
          (
            subsequence($counts, 1, $pos - 1),
            $count + 1,
            subsequence($counts, $pos + 1)
          )
    return local:format($new-chars, $new-counts, $c, $new-out)
  )
};

declare function local:format-string($str) {
  let $chars := local:to-chars($str),
      $counts := for $i in 1 to 26 return 1
  return local:format($chars, $counts, head($chars), '')
};

Upvotes: 4

Related Questions