Reputation: 1147
I have the following xml
<text> Paragraph 1(<italic>A</italic>) is this para.</text>
I want to match the text element if i found a pattern starting with word Paragraph followed by space followed by one or more digit followed by "(" followed by node italic and digit and closing ")". Then it should put a anchor tag around it. so output of above xml should be
<text> <a href="Paragraph1(A)">Paragraph 1(<italic>A</italic>)</a> is this para.</text>
i.e replace Paragraph 1(<italic>A</italic>)
with a tag and href value should be matched text without any spaces and italic node.
Any help or hint how to handle in regex...
Upvotes: 3
Views: 9151
Reputation: 12154
Why do you need regex for this? what's wrong with below code?
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | node()">
<xsl:apply-templates select="@* | node()"/>
<xsl:template match="/xml/para/text">
<a href="Paragraph1(A)">
<xsl:apply-templates select="@*|node()"/>
Upvotes: 0
Reputation: 66781
This XSLT 2.0 stylesheet produces the desired result:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="" version="2.0">
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:template match="@*|node()">
<xsl:apply-templates select="@*|node()"/>
<!-- Only our text element requires special handling here....-->
<xsl:template match="text[matches(.,'Paragraph\s+\d*')]">
<xsl:variable name="textElement" select="."/>
<xsl:analyze-string select="." regex="(Paragraph\s+\d*)(\(.*\))">
<a href="{concat(replace(regex-group(1),'\s',''),regex-group(2))}">
<xsl:apply-templates select="$textElement/node()"/>
Upvotes: 2
Reputation: 2874
This can give you an idea on how you could solve it:
<?xml version="1.0"?>
<xsl:template match="/">
<!-- Only our text element requires special handling here....-->
<xsl:template match="text">
<xsl:when test="matches(.,'Paragraph\s+\d*')">
<!-- Save original text value here -->
<xsl:variable name="temp" select="."/>
<!-- Save the value of <italic>x</italic> child element -->
<xsl:variable name="italic_val" select="italic/text()"/>
<xsl:analyze-string select="." regex="(Paragraph\s+\d*)">
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:value-of select="concat(replace(regex-group(1),'\s',''),'(',$italic_val,')')"/>
<xsl:value-of select="$temp"/>
<xsl:otherwise>DOESNT MATCH</xsl:otherwise>
<xsl:template match="@*|node()">
<xsl:apply-templates select="@*|node()"/>
It basically uses the XSLT identity template to copy the original doc and defines a template to handle <text>
element. There it analyzes its Text() content and for the appropriate Regex: Paragraph . If it finds that it generates the anchor sub-structure. For that I use some temporary variables.
Here my output file:
<text><a href="Paragraph1(A)"> Paragraph 1(A) is this para.</a></text>
I'm still missing the Paragraph 1(<italic>A</italic>
) instead of what I'm getting: Paragraph 1(A) but that's just some tweaking...
Take a look at this link It may help you understand Regex in XSLT
Notice it uses XSLT 2.0
Upvotes: 1
Reputation: 230
This regex without the surrounding quotes:
".*(Paragraph ([0-9]+)`\`(<italic>([0-9])</italic>`\`)"
will give you one outer level capturing group with 2 embedded capturing groups that will give you the values. The out level capturing group is #1 and the 2 embedded ones #2 and #3.
Note that the literal values '(' are escaped with '\
' because '(' is a reserved character in regexes.
Upvotes: 0