Reputation: 31
My input xml has embedded HTML tags in Emp_Name and Country Elements. Our tool is reading those HTML tags like <
and >
. My XML can have any HTML tags on those two fields.
My requirement is to strip those HTML tags to get the below desired output. Could you please assist how this can be achieved in XSLT.
Input XML:
<root>
<Record>
<Emp_ID>288237</Emp_ID>
<Emp_Name> <p>John</p></Emp_Name>
<Country><p>US</p></Country>
<Manager>Wills</Manager>
<Join_Date>5/12/2014</Join_Date>
<Experience>9 years</Experience>
<Project>abc</Project>
<Skill>java</Skill>
</Record>
</root>
Desired Output:
<root>
<Record>
<Emp_ID>288237</Emp_ID>
<Emp_Name>John</Emp_Name>
<Country>US</Country>
<Manager>Wills</Manager>
<Join_Date>5/12/2014</Join_Date>
<Experience>9 years</Experience>
<Project>abc</Project>
<Skill>java</Skill>
</Record>
</root>
Upvotes: 1
Views: 5154
Reputation: 116959
There are basically two ways to approach this:
Turn the escaped markup into real markup by outputting it with disable-output-escaping="yes"
; serialize the output, and process the result as described in the previous iteration of this question: https://stackoverflow.com/a/28535511/3016153 To "serialize the output", you need to save the result to a new file, and initiate another XSLT transformation using the new file as the input - that is unless your processor supports another form of serialization.
Process the escaped markup using a recursive named template to remove the markup. This is awkward and could easily fail if the text contains anything more than just the most basic markup. Here's an example of how this could work:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Emp_Name|Country">
<xsl:copy>
<xsl:call-template name="remove-markup">
<xsl:with-param name="string" select="."/>
</xsl:call-template>
</xsl:copy>
</xsl:template>
<xsl:template name="remove-markup">
<xsl:param name="string"/>
<xsl:choose>
<xsl:when test="contains($string, '<')">
<xsl:value-of select="substring-before($string, '<')" />
<!-- recursive call -->
<xsl:call-template name="remove-markup">
<xsl:with-param name="string" select="substring-after($string, '>')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$string"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Applied to your input, the result is:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Record>
<Emp_ID>288237</Emp_ID>
<Emp_Name> John</Emp_Name>
<Country>US</Country>
<Manager>Wills</Manager>
<Join_Date>5/12/2014</Join_Date>
<Experience>9 years</Experience>
<Project>abc</Project>
<Skill>java</Skill>
</Record>
</root>
Upvotes: 4