Ravi
Ravi

Reputation: 3756

Python toprettyxml() formatting problems

I'm trying to process XML using Python's minidom, and then output the result using toprettyxml(). I ran into two problems:

  1. There are added blank lines.
  2. There are added newlines and tabs for text nodes.

Here's the code and output:

$ cat test.py
from xml.dom import minidom

dom = minidom.parse("test.xml")
print dom.toprettyxml()

$ cat test.xml
<?xml version="1.0" encoding="UTF-8"?>

<store>
    <product>
        <fruit>orange</fruit>
    </product>
</store>


$ python test.py
<?xml version="1.0" ?>
<store>


    <product>


        <fruit>
            orange
        </fruit>


    </product>


</store>

I can workaround problem 1 using strip() to remove blank lines, and I can workaround problem 2 using the hack (fixed_writexml) described in this link: http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/, but I was wondering if there's a better solution since the hack is almost 3 years old now. I'm open to using something other than minidom, but I'd like to avoid adding external packages like lxml.

Upvotes: 2

Views: 4123

Answers (1)

CharlesB
CharlesB

Reputation: 90316

One solution is to patch minidom Library with the proposed patch to the bug you mention.

I haven't tested myself, a bit hacky too, so it may not suit you!

Upvotes: 2

Related Questions