JAWE
JAWE

Reputation: 293

Formatting the output as XML with lxml

My program basically read an input file, makes an lxml.etree from that file, than for example I add a node to the etree and then I want to print it back on a file. So to write it back on a file I use:

et.write('Documents\Write.xml', pretty_print=True)

And the output I have is:

<Variable Name="one" RefID="two"><Component Type="three"><Value>four</Value></Component></Variable>

While I'd like something like:

<Variable Name="one" RefID="two">
    <Component Type="three">
        <Value>four</Value>
    </Component> 
</Variable>

Where am I mistaken? I've tried many solutions but none seems to work (beautifulsoup, tidy, parser...)

Upvotes: 9

Views: 5682

Answers (2)

tymm
tymm

Reputation: 553

Don't use the standard parser. Use a custom parser with remove_blank_text=True.

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(self.output_file, parser=parser)
# Do stuff with the tree here
tree.write(your_output_file, pretty_print=True)

Upvotes: 1

Skineffect
Skineffect

Reputation: 339

That's strange, because it is exactly the way it should work. Could you try this:

root = etree.XML( YOUR XML STRING )
print etree.tostring(root, pretty_print=True)

<Variable Name="one" RefID="two">
  <Component Type="three">
    <Value>four</Value>
  </Component>
</Variable>

This should generate a formatted string, which you can process yourself.

Upvotes: 0

Related Questions