Reputation: 11
This is my test input:
<license>
<p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p>
</license>
Desired output:
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p>
</license>
Basically I am trying to copy the url inside the text where license
element does not contain the attribute xlink:href="http:// ******">
by
looking in child <license-p>
and move any URL up to the xlink:href
attribute on the parent (license)
and here is my xslt:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xlink="http://www.w3.org/1999/xlink"
exclude-result-prefixes="xs"
version="3.0">
<xsl:output method="html" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="license">
<xsl:copy>
<xsl:attribute name="xlink:href">
<xsl:value-of select='replace(p,"[\s\S]*" ,"(\b(?:(?:https?|ftp):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$]))")'/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="p/@xlink:href"/>
</xsl:stylesheet>
The regex I am using is not working for saxon owing characters like?
Upvotes: 0
Views: 732
Reputation: 11
Ok folks, I know regex is far from perfect but the following works for me:
<xsl:analyze-string
select="$elValue"
regex="((https?|ftp|gopher|telnet|file):(()|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*\w*.\w*\W\w*\W\w*\W\d.\d\W)">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
</xsl:matching-substring>
</xsl:analyze-string>
Upvotes: 1