Reputation: 76
I have an XML file with all the nodes that contain information are in CDATA. These information are possibly formatted with some HTML tags, something like this:
<EventList>
<Text><![CDATA[<p>Some text <i>is</i> formatted! This is a character entity '</p>]]></Text>
<ShortText><![CDATA[Some other is only plain]]></ShortText>
<!-- others more -->
</EventList>
I want to transform this with XSLT in a (X)HTML page:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml">
<xsl:output
method="html"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
media-type="application/xhtml+xml"
encoding="utf-8"
omit-xml-declaration="yes"
indent="no"
/>
<xsl:template match="Text">
<h2><xsl:copy-of select="text()"/></h2>
</xsl:template>
<xsl:template match="ShortText">
<div><xsl:copy-of select="."/></div>
</xsl:template>
</xsl:stylesheet>
But appling this transformation produce a strange behavior. The HTML tags I did put in the XSLT are parsed and interpreted correctly from the browser, but the tags inside the CDATA are stripped of the <
, >
and &
char, producing this output:
<h2>pSome text iis/i formatted! This is a character entity #39;/p</h2>
<div>Some other is only plain</div>
At first it looked something like an issue in the <xsl:output>
definition, but I'm still stuck on this. I've tried to use the shorthand XPath .
and the function text()
but the output it's the same.
Any suggestion is appreciated!
Upvotes: 1
Views: 4826
Reputation: 25034
Your XML says that the content of the Text element is a string of characters with no markup in it, which happens to contain a number of occurrences of XML delimiters like left angle bracket and ampersand. Your stylesheet says to write that string of characters out as a string of characters, without markup, so a conforming HTML processor will do so, producing as output something like
<H2 xmlns="http://www.w3.org/1999/xhtml"
><p>Some text <i>is</i> formatted!
This is a character entity &#39;</p></h2>
<div xmlns="http://www.w3.org/1999/xhtml"
><ShortText xmlns="">Some other is only plain</ShortText></div>
I've introduced line breaks to keep the lines shorter. This is not what you are showing as your output, which is suggestive in itself.
The easiest way to get better results is to make your XML start telling the truth about the data: if you want the Text element to contain some HTML elements like p and i, then make it do so, and then use an identity transform on that part of your data.
If the broken design of this XML is something you are stuck with, then you can work around the damage by using the disable-output-escaping
attribute on the xsl:value-of element. (Warning: the need to use disable-output-escaping is almost always a signal that something is wrong in the design.) This version of your template for Text produces output in which the string data of the input is written out as XHTML markup:
<xsl:template match="Text">
<h2><xsl:value-of select="string(.)"
disable-output-escaping="yes"/></h2>
</xsl:template>
Upvotes: 2