I want to pretty print an org.w3c.dom.Document without a schema

Question

i feel i'm going mad. I want to pretty print an org.w3c.dom.Document without a schema (in Java). Indentation is not all that i need, i want useless empty lines and whitespaces ignored. Somehow this doesn't happen, every time i parse an XML from a file or write it back to a file there are text nodes containing whitespace in the DOM document( , spaces, etc). Isn't there a way that i can get rid of these simply, without a schema and without transforming the XML myself by iterating over all the nodes and removing the empty text nodes?

Example: my input file looks like this (but with a lot more empty lines :)


       content

I would like my output file to look like this:


  content

Note: i don't have a schema for the XML (so i'm forced to call builder.setValidating(false)) and i don't have the luxury of an internet connection when this code is run.

Thanks!

UPDATE: i found something very close to what i need and maybe it helps other soldiers fighting against XML documents without schemas:

org.apache.axis.utils.XMLUtils.normalize(document);

Source code here. Calling this after the Document is created and before it's written with a Transformer will produce the pretty output with absolutely no schema validation. JB Nizet also gave me a working answer but i have the feeling some validation is going on behind the scenes of that code which would make it different than my use case. I leave the question open for a few days though in case someone has an even better solution.

I want to pretty print an org.w3c.dom.Document without a schema

Answers (1)

Related Questions