͢bts
͢bts

Reputation: 695

In XSLT can I tokenize on nothing?

I need to convert the string 'abcdef' to its parts, 'a', 'b', 'c', 'd', 'e', 'f'. Stupidly I tried tokenize('abcdef', '') but of course that returns a FORX0003 error (The regular expression in tokenize() must not be one that matches a zero-length string).

I'm actually trying to convert the string finally to 'a/b/c/d/e/f' so any shortcuts that would get me directly to this state would also be useful.

(I'm using Saxon 9.3 for .NET platform)

Upvotes: 5

Views: 689

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243479

To get the desired character sequence from a string $str use the pair of functions string-to-code-points() and codepoints-to-string():

for $c in string-to-codepoints($str)
 return
    codepoints-to-string($c)

To get this character sequence joined with '/' as the join-string, simply apply string-join() on the above expression.

Here is a full code example:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>

 <xsl:template match="/">
     <xsl:sequence select=
      "string-join(
              for $c in string-to-codepoints('ABC')
              return
                 codepoints-to-string($c),
            '/'
                     )
      "/>
 </xsl:template>
</xsl:stylesheet>

produces the wanted character sequence:

A/B/C

Explanation:

string-to-codepoints($str) produces a sequence of code-points (think of them as "character codes") representing each character of the string.

For example;

string-to-codepoints('ABC')

produces the sequence:

65 66 67

codepoints-to-string($code-seq)

is the inverse function of string-to-codepoints(). Given a sequence of codepoints, it produces the string, whose characters are represented by the codepoints in the sequence. Thus:

codepoints-to-string((65,66,67))

produces the string:

ABC

Therefore:

for $c in string-to-codepoints($str)
 return
    codepoints-to-string($c)

gets the codepoint of each individual character in $str and converts it to a separate string.

Using string-join() we then join all such separate strings using the provided join-character "/".

Upvotes: 5

FailedDev
FailedDev

Reputation: 26930

Use this line:

replace(replace($input, "(.)", "$1/", "s"), "(.*).$", "$1", "s")

Where $input points at your original string. The return of this line is your desired string.

a/b/c/d/e/f

Upvotes: 2

Related Questions