Cheeso
Cheeso

Reputation: 192467

How to indent attributes in when prettyprinting xml in python?

Suppose I have XML like this:

 <graph label="Test" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cy="http://www.cytoscape.org" xmlns="http://www.cs.rpi.edu/XGMML"  directed="1">
    <foo>...</foo>
 </graph>

The first element name with all its attributes all appear on one line.

I have seen how to pretty print the element tree, using lxml, with code like this:

from lxml import etree
 ...
def prettyPrintXml(filePath):
    assert filePath is not None
    parser = etree.XMLParser(resolve_entities=False, remove_blank_text=True, 
                             strip_cdata=False)
    document = etree.parse(filePath, parser)
    print(etree.tostring(document, pretty_print=True, encoding='utf-8'))

... but using that, every element appears on one line.

Is there a magic incantation to tell the pretty printer to insert newlines between the element attributes so that, for example, the line length does not exceed 80 characaters?

I would like the result to look something like this:

<graph label="Test"
       xmlns:dc="http://purl.org/dc/elements/1.1/"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:cy="http://www.cytoscape.org"
       xmlns="http://www.cs.rpi.edu/XGMML"  directed="1">
  <foo>...</foo>
</graph>

ps: I don't want to resort to subprocess and xmllint

Upvotes: 12

Views: 10378

Answers (1)

Steen
Steen

Reputation: 6849

lxml has a pretty print function built in: here's a tutorial which describes several ways to print xml. It has some limitations (limitations in the xml specs, according to lxml), though.

This stackoverflow question has several answers with more or less hacky solutions to pretty print xml, and I think you could model at least the regexp based answer to suit your needs.

Fredrik Lundh (of ElementTree fame) has a very low-level description for printing xml, which you could also customize to newline and indent attributes.

Upvotes: 2

Related Questions