Reputation:
I am working on a system that should be able to read any (or at least, any well-formed) XML file, manipulate a few nodes and write them back into that same file. I want my code to be as generic as possible and I don't want
The aim is to have the source file entirely unchanged except for the changed Nodes, which are retrieved via XPath. I would like to get away with the standard javax.xml stuff.
My progress so far:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setAttribute("http://xml.org/sax/features/namespaces", true);
factory.setAttribute("http://xml.org/sax/features/validation", false);
factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setNamespaceAware(true);
factory.setIgnoringElementContentWhitespace(false);
factory.setIgnoringComments(false);
factory.setValidating(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(inStream));
This loads the XML source into a org.w3c.dom.Document successfully, ignoring DTD validation. I can do my replacements and then I use
Source source = new DOMSource(document);
Result result = new StreamResult(getOutputStream(getPath()));
// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);
to write it back. Which is nearly perfect. But the Doctype tag is gone, no matter what I do. While debugging, I saw that there is a DeferredDoctypeImpl [log4j:configuration: null] object in the Document object after parsing, but it is somehow wrong, empty or ignored. The file I tested on starts like this (but it is the same for other file types):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" debug="false">
[...]
I think there are a lot of (easy?) ways involving hacks or pulling additional JARs into the project. But I would rather like to have it with the tools I already use.
Upvotes: 2
Views: 12138
Reputation: 1420
I tried using the LSSerializer library and was unable to get anywhere with it in terms of retaining the Doctype. This is the solution that Stephan probably used Note: This is in scala but uses a java library so just convert your code
import com.sun.org.apache.xml.internal.serialize.{OutputFormat, XMLSerializer}
def transformXML(root: Element, file: String): Unit = {
val doc = root.getOwnerDocument
val format = new OutputFormat(doc)
format.setIndenting(true)
val writer = new OutputStreamWriter(new FileOutputStream(new File(file)))
val serializer = new XMLSerializer(writer, format)
serializer.serialize(doc)
}
Upvotes: 0
Reputation: 3265
Here's how you could do it using the LSSerializer found in JDK:
private void writeDocument(Document doc, String filename)
throws IOException {
Writer writer = null;
try {
/*
* Could extract "ls" to an instance attribute, so it can be reused.
*/
DOMImplementationLS ls = (DOMImplementationLS)
DOMImplementationRegistry.newInstance().
getDOMImplementation("LS");
writer = new OutputStreamWriter(new FileOutputStream(filename));
LSOutput lsout = ls.createLSOutput();
lsout.setCharacterStream(writer);
/*
* If "doc" has been constructed by parsing an XML document, we
* should keep its encoding when serializing it; if it has been
* constructed in memory, its encoding has to be decided by the
* client code.
*/
lsout.setEncoding(doc.getXmlEncoding());
LSSerializer serializer = ls.createLSSerializer();
serializer.write(doc, lsout);
} catch (Exception e) {
throw new IOException(e);
} finally {
if (writer != null) writer.close();
}
}
Needed imports:
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.w3c.dom.Document;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
I know this is an old question which has already been answered, but I think the technical details might help someone.
Upvotes: 0
Reputation:
Sorry, got it right now using a XMLSerializer instead of the Transformer...
Upvotes: 1