Reputation: 3079
With simple code parsing and rewriting simple xml, some strange thing occurs with this
INPUT:
<html>
<input></input>
</html>
gives OUTPUT (not well-formed):
<html>
<input>
</html>
same thing occurs with < input/ >, or < br/ >.
It doesn't occur inside < html2 >, with other tags, ...
The code is classical:
// READ XML
DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
DocumentBuilder builder = builderFactory.newDocumentBuilder();
// PARSE
Document document = builder.parse(new InputSource(new StringReader(_xml_source)));
// WRITE XML
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
StringWriter buffer = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(document), new StreamResult(buffer));
String output = buffer.toString();
It it a known bug ?
Upvotes: 1
Views: 684
Reputation: 159215
XSLT defines an output method, which can be xml
, html
, or text
.
The specification says that the default output method should be html
if the root node is <html>
, otherwise it should be xml
.
With the xml
method, you will get <input/>
.
With the html
method, you will get <input>
, because the HTML specification says so.
You can explicitly give the output method, if you want:
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
So that a document with an <html>
root node will output XML, i.e. <input/>
.
Quotes
The default for the
method
attribute is chosen as follows. If
- the root node of the result tree has an element child,
- the expanded-name of the first element child of the root node (i.e. the document element) of the result tree has local part
html
(in any combination of upper and lower case) and a null namespace URI, and- any text nodes preceding the first element child of the root node of the result tree contain only whitespace characters,
then the default output method is
html
; otherwise, the default output method isxml
. The default output method should be used if there are noxsl:output
elements or if none of thexsl:output
elements specifies a value for themethod
attribute.
Some HTML element types have no content. For example, the line break element
BR
has no content; its only role is to terminate a line of text. Such empty elements never have end tags. The document type definition and the text of the specification indicate whether an element type is empty (has no content) or, if it can have content, what is considered legal content.
Upvotes: 3