How to pretty print xml in python without generating a DOM tree?

Question

Generating a DOM tree is too expensive for very large xml data. Is there a method to accomplish the printing without generating it? I am using python-2.7.

Djizeus · Accepted Answer

Whatever the language, the way to parse a XML document without generating a tree is to use event-oriented parsers. With these kinds of parser, you give to the parser some event handlers that the parser will call at specific points of the processing: beginning of a node, end of a node, beginning of data, etc.

So you can use that kind of parser and go to a new line each time there is a new node, and increase indentation where you are entering a node and decrease indentation when you are exiting a node. Because of the way these parsers work, it will be tricky to look ahead to see for example if a node fit in a line, so the pretty print may not be as pretty as when working with a tree (or you can, but it would be complicated).

In python, there are 3 event-driven parsers that come with the standard library (in no particular order):

ElementTree.iterparse()
pyexpat
sax (SAX is a well-known event-driven XML parsing API)

I suggest you have a look at them and try to play with.

How to pretty print xml in python without generating a DOM tree?

Answers (1)

Related Questions