Reputation: 118
It looks like JAXP allows assigning any value to a document node, including <, >, and & and others. Playing with XML reserved characters and XSLT raises a question. Consider the following code:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
...
Element field = doc.createElement("col");
field.setTextContent( "<p>&]]" );
row.appendChild( field );
...
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("templateName.xsl"));
Transformer transformer = factory.newTransformer(xslt);
transformer.transform( new DOMSource(doc), new StreamResult(printer) );
Now, if we have
<xsl:value-of select="col" disable-output-escaping="yes"/>
in "templateName.xsl", the output will look like
"<p>&]]"
and if we have this
<xsl:value-of select="col"/>
the output will be
<p>&]]
so basically my question is, what kind of internal data representation JAXP uses such that this
"<p>&]]"
is OK? It cannot be a text node, and cannot be a CDATA node, too. What is it? There must be a valid XML document supplied for a transformation, I believe. On the other hand, disable-output-escaping attribute indicates that special characters should be output as-is, does it mean our "col" node is kept as in the code? How come the XML document is valid then?
Upvotes: 2
Views: 186
Reputation: 118
OK, I think I've figured out how it works. Any of the XML reserved symbols must be escaped unless they are in a CDATA node. Next, what disable-output-escaping="yes" attribute will do depends on the node type. If it's a text node, it will undo escaping such that "<" transforms to "<". In case it's a CDATA node, it will disable escaping and CDATA will be output as-is. In either case, all tags enclosed in a text node are stripped off while retained for CDATA (and escaped according to disable-output-escaping). So either DOMSource or Transformer (not sure who renders DOM to XML) will do actual escaping of a DOM text node before transformation (and CDATA is kept intact). So for a text node, disable-output-escaping should read undo-xml-escaping which solves my confusion.
Anyways, thanks to Michael for explanation!
Upvotes: 1
Reputation: 163498
disable-output-escaping typically only works if the output of the transformation is written straight to a serializer. Although the XSLT specification describes it in terms of an extension to the data model whereby there is an extra bit associated with every character in a text node saying "disable escaping of this character", most implementations are unlikely to allow you to store an instance of this model as a tree in memory, and the extra bit exists only when trees are streamed from the transformer to the serializer.
(In Saxon's implementation, rather than using an extra bit per character, it inserts an x00 character into the data stream passing from the transformer to the serializer to switch escaping on or off; this relies on the fact that x00 is a legal character in Java but not in XML).
Upvotes: 2