Trond
Trond

Reputation: 403

removing invalid characters in xml with xslt

In an XML-file I am seeing this in the source: &lt;#&gt; which causes problems in another application which sees this as <#>

I am using XSLT2.0 and I have tried to do a replace on anything from # to div to remove the &lt;#&gt; with an empty string altogether. I have done this by putting the structure inside a variable and replacing.

The result of the replace function is that I am loosing all other elements as well. Any suggestions are welcome. The input could look something like this:

`<html>
 <body>
   <p>&lt;#&gt;This is just a test</p>
 </body>
</html>`

But it can also look something like this:

`<html>
 <body>
   &lt;#&gt;<p>This is just a test</p>
 </body>
</html>`

Wanted out put is:

`<html>
 <body>
   <p>This is just a test</p>
 </body>
</html>`

The XSL I have tried is this, which removes all elements. I do see I am doing this on a copy-of, so that' might be wrong...:

`<xsl:template name="body">
  <xsl:copy-of select="replace($bodycontent, '#', 'div /')" />
 </xsl:template name="body">

   <xsl:variable name="bodycontent">
    <xsl:apply-templates select="/newsMessage/itemSet/newsItem/contentSet/inlineXML/h:html/h:body/h:section/h:p" />
    <p class="txt-ind">
        <xsl:value-of select="//rightsInfo/copyrightHolder/name" />
    </p>
</xsl:variable>`

Upvotes: 1

Views: 2254

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167716

If you use

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()">
    <xsl:value-of select="replace(., '&lt;#&gt;', '')"/>
</xsl:template>

then any occurrence of those characters will be removed, online at http://xsltransform.net/gWvjQfu

Upvotes: 3

Related Questions