Reputation: 572
I'm using XMLOutputFactory with the default Java implementation, when the text that is output has formfeed, it produces an invalid XML file. Apparently, the formfeed character must be escaped, but the XML writer does not escape it. (Perhaps there are other characters that are supposed to be escaped, as well, that are not being escaped).
Is this a bug? Is there a workaround, or is there a parameter I can provide to the XML writer to change the behavior?
The text I am writing may have formfeeds, I want to output it into the XML, and be able to read it later.
Here's my sample code, the \f is the formfeed, both are written exactly as ASCII 12 (form feed) without being escaped. When I feed the output to the XML parser, I get an error trying to read the formfeed, "An invalid XML character (Unicode: 0xc) was found".
public static void main(String[] args) throws XMLStreamException, FileNotFoundException, Exception {
XMLOutputFactory factory = XMLOutputFactory.newInstance();
try {
XMLStreamWriter writer = factory.createXMLStreamWriter(
new java.io.FileWriter("d:/xyz/ImportXml/out1.xml"));
writer.writeStartDocument();
writer.writeCharacters("\n");
writer.writeStartElement("document");
writer.writeCharacters("\n");
writer.writeCharacters("some text character value \"of the\" field & more text \f in <brackets> here.");
writer.writeCharacters("\n");
writer.writeStartElement("data");
writer.writeAttribute("name", "value \"of the\" field & more text \f in <brackets> here.");
writer.writeEndElement();
writer.writeCharacters("\n");
writer.writeEndElement();
writer.writeCharacters("\n");
writer.writeEndDocument();
writer.flush();
writer.close();
} catch (XMLStreamException e) {
e.printStackTrace();
} catch (java.io.IOException e) {
e.printStackTrace();
}
}
Upvotes: 4
Views: 2703
Reputation: 313
Not a bug. It's a feature. You can add characters verification or do own implementation of XMLStreamWriter interface.
Oracle document http://docs.oracle.com/javase/7/docs/api/javax/xml/stream/XMLStreamWriter.html says:
The XMLStreamWriter does not perform well formedness checking on its input. However the writeCharacters method is required to escape & , < and > For attribute values the writeAttribute method will escape the above characters plus " to ensure that all character content and attribute values are well formed.
Correspond to http://www.w3.org/TR/xml11/#charsets Restricted chars for xml are [#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84], [#x86-#x9F]
"\f" is char with code #x0C.
Upvotes: 2