MSW
MSW

Reputation: 62

xsl:matching-substring always returns "false"

I'm trying to write a function which gets the domain name from a URL text in XML file i.e www.example.com.

 <xsl:function name="fdd:get-domain">
    <xsl:param name="url"/>

    <xsl:analyze-string select="$url" regex="^(.*)://([a-zA-Z0-9\-\.]?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?)(/.*)$">
        <xsl:matching-substring>
            <xsl:value-of select="regex-group(1)"/>
        </xsl:matching-substring>

        <xsl:non-matching-substring>
            <xsl:value-of select="false()"/>
        </xsl:non-matching-substring>

    </xsl:analyze-string>
 </xsl:function>

This function always returns false. I'm not sure what am I missing in this.

Upvotes: 1

Views: 1089

Answers (1)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243579

Inside an attribute value every { and } must be doubled (in order to distinguish them from the single chars that denote an AVT. Just by doubling the curly braces:

^(.*)://([a-zA-Z0-9\-\.]?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{{2,3}}(/\S*)?)(/.*)$

with this correction, when called like this:

fdd:get-domain('http://www.abc/cpm/page.aspx')

the result is:

http

I guess that you really want to get the domain, as this modified code (both the regex expression and the regex-group index) does:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:fdd="some:fdd">
 <xsl:output method="text"/>

 <xsl:template match="/">
  <xsl:sequence select="fdd:get-domain('http://www.abc.com/cpm/page.aspx')"/>
 </xsl:template>

      <xsl:function name="fdd:get-domain">
        <xsl:param name="url"/>

        <xsl:analyze-string select="$url" regex=
"^(.*)://([a-zA-Z0-9\-\.]?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{{2,3}})(/\S*)?(/.*)$">
            <xsl:matching-substring>
                <xsl:value-of select="regex-group(2)"/>
            </xsl:matching-substring>

            <xsl:non-matching-substring>
                <xsl:value-of select="false()"/>
            </xsl:non-matching-substring>

        </xsl:analyze-string>
     </xsl:function>
</xsl:stylesheet>

When this transformation is applied on any XML document (not used), the wanted, correct result is produced:

www.abc.com

Update: As reminded by Michael Kay, the need to duplicate any curly braces can be avoided if the RegEx is specified as the context of a variable and this variable is referenced as an AVT in the regex attribute of xsl:analyze-string :

<xsl:analyze-string select="$url" regex="{$vRegEx}"
                    flags="mx" >

This has another benefit -- we can split RegEx subexpressions on different lines and even intermix them with comments.

Here is the refactored transformation:

<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:fdd="some:fdd">
     <xsl:output method="text"/>

 <xsl:variable name="vRegEx">

   ^(.*) <!-- The scheme -->

   ://

   ([a-zA-Z0-9\-\.]?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}) <!-- The domain -->

   (/\S*)?(/.*)$  <!-- the path and query string -->

 </xsl:variable>

     <xsl:template match="/">
      <xsl:sequence select="fdd:get-domain('http://www.abc.com/cpm/page.aspx')"/>
     </xsl:template>

          <xsl:function name="fdd:get-domain">
            <xsl:param name="url"/>

            <xsl:analyze-string select="$url" regex="{$vRegEx}"
                                flags="mx" >
                <xsl:matching-substring>
                    <xsl:value-of select="regex-group(2)"/>
                </xsl:matching-substring>

                <xsl:non-matching-substring>
                    <xsl:value-of select="false()"/>
                </xsl:non-matching-substring>

            </xsl:analyze-string>
         </xsl:function>
</xsl:stylesheet>

Upvotes: 1

Related Questions