Jan
Jan

Reputation: 930

How to transform HTML encoded XML?

I have an input XML like

<values xsi:type="xsd:string">&lt;Test objectgroupNr="001"/&gt;&lt;bezeichnung&gt;A&amp;amp;B &lt;/bezeichnung&gt;</values>

which has HTML encoded characters that I want to transform to "plain" XML encoding:

<values xsi:type="xsd:string">
        <Test objectgroupNr="001"/>
        <bezeichnung>A&amp;B</bezeichnung>
</values>

I could change some characters with

<xsl:character-map name="fischer">
            <xsl:output-character character="&lt;" string="&lt;"/>
            <xsl:output-character character="&gt;" string="&gt;"/>
</xsl:character-map>      
<xsl:output method="xml" use-character-maps="fischer"/>

But it does not seem to be a good idea to type in all possible special characters like Ä, Ü, ß, é and so on...

Can this be done in an easy way with XSLT? The transformation takes place in the environment of Sonic ESB using Saxon 8.9.

Upvotes: 1

Views: 545

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167516

According to http://www.saxonica.com/documentation8.9/extensions/functions/parse.html the extension function is supported so you should be able to use e.g.

<xsl:template match="values">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:copy-of select="saxon:parse(concat('&lt;root&gt;', ., '&lt;/root&gt;'))/*/node()"/>
  </xsl:copy>
</xsl:template>

where you put xmlns:saxon="http://saxon.sf.net/" as a namespace declaration into the stylesheet.

Upvotes: 1

Related Questions