voyager
voyager

Reputation: 11

Capturing url within text by using regex in xslt code

This is my test input:

<license>
     <p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p>
</license>

Desired output:

<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
     <p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p> 
</license>

Basically I am trying to copy the url inside the text where license element does not contain the attribute xlink:href="http:// ******"> by looking in child <license-p> and move any URL up to the xlink:href attribute on the parent (license)

and here is my xslt:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xlink="http://www.w3.org/1999/xlink"

exclude-result-prefixes="xs"
version="3.0"> 
    <xsl:output method="html" encoding="UTF-8" indent="yes" />
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="license">
          <xsl:copy>
            <xsl:attribute name="xlink:href">                    
                <xsl:value-of select='replace(p,"[\s\S]*" ,"(\b(?:(?:https?|ftp):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&amp;@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&amp;@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&amp;@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&amp;@#\/%=~_|$]))")'/>
            </xsl:attribute> 
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="p/@xlink:href"/>   
</xsl:stylesheet>

The regex I am using is not working for saxon owing characters like?

Upvotes: 0

Views: 732

Answers (1)

voyager
voyager

Reputation: 11

Ok folks, I know regex is far from perfect but the following works for me:

<xsl:analyze-string 
    select="$elValue"
    regex="((https?|ftp|gopher|telnet|file):(()|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&amp;]*\w*.\w*\W\w*\W\w*\W\d.\d\W)">                    
        <xsl:matching-substring>
            <xsl:value-of select="regex-group(1)"/>                       
        </xsl:matching-substring>
</xsl:analyze-string>

Upvotes: 1

Related Questions