Sam
Sam

Reputation: 5270

How to replace characters in xml document using xsl

I have an xml document(This xml is not well formed) as follows

<ads>
   <adv>
       <a>BURGER & BROWN ENGINEERING</a>
       <b>123*3491</b>
   <adv>
   <adv>
       <x>Roster Service</x>
       <y>BROWN & BURGER ENGINEERING</y>
       <z>905*3490</z>
   <adv>
<ads>

I would like to have an XSLT to transform the XML to this.

i) ampersand(&) should be replaced with " and "

ii) * should be replaced with " "

<ads>
   <adv>
       <a>BURGER and BROWN ENGINEERING</a>
       <b>123 3491</b>
   <adv>
   <adv>
       <x>Roster Service</x>
       <y>BROWN and BURGER ENGINEERING</y>
       <z>905 3490</z>
   <adv>
<ads>

I have an xsl as follows but this does not satisfy my requirement.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>

<xsl:template match="node()|@*">
   <xsl:copy>
     <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="text()">
  <xsl:value-of select="translate(., '&', ' and ')" />
  <xsl:value-of select="translate(., '*', ' ')" />
</xsl:template>

Upvotes: 0

Views: 1864

Answers (2)

Michael Kay
Michael Kay

Reputation: 163587

Your input is not XML, so no tool designed for processing XML will be able to read it.

The best solution with bad XML is always to fix the software that's generating it. But if the software is written by some cowboy outfit that doesn't care about quality or support or users, then that may not be possible.

If you need to repair bad XML, then you will need non-XML tools to do it, typically some combination of Perl/awk/sed. It's not always possible, of course, because if the software is generating XML that's ill-formed, it may also be generating XML that's well-formed but contains the wrong information.

Failing to escape ampersands is quite a common problem, and it depends how good a fix you need. Sometimes you can fix 99% of the problems by replacing any & that isn't followed by a letter, '#', or a digit by &amp;.

Upvotes: 2

michael.hor257k
michael.hor257k

Reputation: 117102

Given a well-formed XML input such as:

XML

<ads>
   <adv>
       <a>BURGER &amp; BROWN ENGINEERING</a>
       <b>123*3491</b>
   </adv>
   <adv>
       <x>Roster Service</x>
       <y>BROWN &amp; BURGER ENGINEERING</y>
       <z>905*3490</z>
   </adv>
</ads>

You can use the following stylesheet:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="@*|*">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()">
  <xsl:value-of select="replace(translate(., '*', ' '), '&amp;', 'and')" />
</xsl:template>

</xsl:stylesheet>

to return:

<?xml version="1.0" encoding="UTF-8"?>
<ads>
   <adv>
       <a>BURGER and BROWN ENGINEERING</a>
       <b>123 3491</b>
   </adv>
   <adv>
       <x>Roster Service</x>
       <y>BROWN and BURGER ENGINEERING</y>
       <z>905 3490</z>
   </adv>
</ads>

Upvotes: 2

Related Questions