sanjay
sanjay

Reputation: 1020

XSLT - 'apostrophe' cannot use in tokenize() function

I have a xml elements like this,

<p>'data1':'2','data2':'Sports like Cricker, Hokey',</p>

I need to break down these elements into multiple <p> elements as follows,

<p>'data1':'2'</p>
<p>'data2':'Sports like Cricket, Hokey',</p>

I've written following XSLT to do this task,

<xsl:template match="p">
        <xsl:variable name="tokens" select="tokenize(., ',')"/>
        <xsl:for-each select="$tokens">
            <xsl:analyze-string select="." regex="^&apos;(.*)&apos;:&apos;(.*)$">
                <xsl:matching-substring>
                    <p>
                        <xsl:value-of select="."/>
                    </p>
                </xsl:matching-substring>
            </xsl:analyze-string>
        </xsl:for-each>
    </xsl:template>

This code work fine when , is not appear in the middle of the text. (eg: 'Sports like Cricket, Hokey'). But if , is there in the text this is going to break as in this example.

I tried to use tokenize function as follows but it seems apostrophe does not allowed in tokenize() function in XSLT.

tokenize(., '',')

Could anyone suggest me a solution for this?

Upvotes: 0

Views: 150

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 31011

One of reasons that your script failed is that you used &apos; instead of plain apostrophe (&apos; is used when you write output, but in regex use just ').

Another reason, visible in the second source <p> element is that after the terminating ' you have a comma, whereas your regex terminates with $.

So the regex can be e.g.:

'([^']+)'\s*:\s*'([^']+)'

Details:

  • An apostrophe (opening).
  • A non-empty sequence of chars other than an apostrophe.
  • An apostrophe (closing).
  • A colon, possibly surrounded with spaces.
  • The same construction as for the "first" part (before the colon).

Below you have an example script:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" />
  <xsl:strip-space elements="*"/>

  <xsl:template match="p">
    <xsl:analyze-string select="." regex="'([^']+)'\s*:\s*'([^']+)'">
      <xsl:matching-substring>
        <p><xsl:value-of select="concat(regex-group(1),
          ' / ', regex-group(2))"/></p>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
  </xsl:template>
</xsl:stylesheet>

For source data as below:

<?xml version="1.0" encoding="utf-8" ?>
<body>
  <p>'data1':'2','data3':'5'</p>
  <p>'data2':'Sports like Cricket, Hokey',</p>
</body> 

it outputs:

<?xml version="1.0" encoding="UTF-8"?>
<body>
   <p>data1 / 2</p>
   <p>data3 / 5</p>
   <p>data2 / Sports like Cricket, Hokey</p>
</body>

Note that the first source <p> contains two key : value pairs, which are the source of two first output <p> elements.

Upvotes: 1

Related Questions