atif
atif

Reputation: 1147

pattern match in xslt

I have the following xml

<xml>
    <para>
       <number>1</number>
       <text> Paragraph 1(<italic>A</italic>) is this para.</text>
    </para>
</xml>

I want to match the text element if i found a pattern starting with word Paragraph followed by space followed by one or more digit followed by "(" followed by node italic and digit and closing ")". Then it should put a anchor tag around it. so output of above xml should be

 <xml>
    <para>
       <number>1</number>
       <text> <a href="Paragraph1(A)">Paragraph 1(<italic>A</italic>)</a> is this para.</text>
    </para>
</xml>

i.e replace Paragraph 1(<italic>A</italic>) with a tag and href value should be matched text without any spaces and italic node.

Any help or hint how to handle in regex...

Upvotes: 3

Views: 9144

Answers (4)

Rookie Programmer Aravind
Rookie Programmer Aravind

Reputation: 12154

Why do you need regex for this? what's wrong with below code?

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

  <xsl:template match="/xml/para/text">
    <xsl:copy>
      <a href="Paragraph1(A)">
        <xsl:apply-templates select="@*|node()"/>
      </a>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Upvotes: 0

Mads Hansen
Mads Hansen

Reputation: 66714

This XSLT 2.0 stylesheet produces the desired result:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output omit-xml-declaration="no" indent="yes"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- Only our text element requires special handling here....-->
    <xsl:template match="text[matches(.,'Paragraph\s+\d*')]">
        <xsl:copy>
            <xsl:variable name="textElement" select="."/>
            <xsl:analyze-string select="." regex="(Paragraph\s+\d*)(\(.*\))">
                <xsl:matching-substring>
                    <a href="{concat(replace(regex-group(1),'\s',''),regex-group(2))}">
                        <xsl:apply-templates select="$textElement/node()"/>
                    </a>
                </xsl:matching-substring>
            </xsl:analyze-string>       
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Upvotes: 2

Adolfo Perez
Adolfo Perez

Reputation: 2874

This can give you an idea on how you could solve it:

<?xml version="1.0"?>

<xsl:template match="/">
    <xsl:apply-templates/>
</xsl:template>

    <!-- Only our text element requires special handling here....-->
<xsl:template match="text">
    <xsl:copy>
        <xsl:choose>
            <xsl:when test="matches(.,'Paragraph\s+\d*')">
                <!-- Save original text value here -->
                <xsl:variable name="temp" select="."/>
                <!-- Save the value of <italic>x</italic> child element -->
                <xsl:variable name="italic_val" select="italic/text()"/>
                <xsl:analyze-string select="." regex="(Paragraph\s+\d*)">
                    <xsl:matching-substring>
                        <xsl:element name="a">
                            <xsl:attribute name="href">
                                <xsl:value-of select="concat(replace(regex-group(1),'\s',''),'(',$italic_val,')')"/>
                            </xsl:attribute>
                            <xsl:value-of select="$temp"/>
                        </xsl:element>
                    </xsl:matching-substring>
                </xsl:analyze-string>

            </xsl:when>
            <xsl:otherwise>DOESNT MATCH</xsl:otherwise>
        </xsl:choose>
    </xsl:copy>
</xsl:template>

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

It basically uses the XSLT identity template to copy the original doc and defines a template to handle <text> element. There it analyzes its Text() content and for the appropriate Regex: Paragraph . If it finds that it generates the anchor sub-structure. For that I use some temporary variables.

Here my output file:

<xml>
  <para>
    <number>1</number>
    <text><a href="Paragraph1(A)"> Paragraph 1(A) is this para.</a></text>
  </para>
</xml>

I'm still missing the Paragraph 1(<italic>A</italic>) instead of what I'm getting: Paragraph 1(A) but that's just some tweaking...

Take a look at this link It may help you understand Regex in XSLT

Notice it uses XSLT 2.0

Upvotes: 1

Gert
Gert

Reputation: 230

This regex without the surrounding quotes:

".*(Paragraph ([0-9]+)`\`(<italic>([0-9])</italic>`\`)"

will give you one outer level capturing group with 2 embedded capturing groups that will give you the values. The out level capturing group is #1 and the 2 embedded ones #2 and #3. Note that the literal values '(' are escaped with '\' because '(' is a reserved character in regexes.

Upvotes: 0

Related Questions