rbaleksandar
rbaleksandar

Reputation: 9701

How to avoid incorrect indentation in generated XML file when inserting ElementTree in some Element?

I'm dynamically generating a lot of XML data. Each document is intended as a test for specific feature(s) in a software that processes it.

A document consists of multiple different blocks. In order to keep things simple let's say the document above is what I work with:

<doc>
  <attributes>
    <attr/>
    <attr/>
  </attributes>
  <items>
    <item/>
    <item/>
    <item/>
  </items>
</doc>

The number of attr elements varies and the same applies to the number of item ones. However the structure of each doesn't change (only the data inside).

In order to keep my Python script more readable I have stored a bunch of template XML files each of which represents a repeating element (with all of its children if any). In my main script using lxml I create doc, attributes and items. Given parameters for defining how many item and attr elements there need to be in the resulting XML document I simply do a loop, load the respective template, adjust the data inside and then append it to the respective parent (here attributes and items elements):

from lxml import etree

attrs = 2
its = 10

root = etree.Element('doc')
root.addprevious(etree.Comment('...'))
doc = etree.ElementTree(root)

attributes = etree.SubElement(root, 'Attributes')
for a in range(0, attrs):
  attr = etree.parse('attribute.xml', parser=etree.XMLParser(remove_comments=True))
  attributes.append(attr.getroot()

items = etree.SubElement(root, 'Items')
for i in range(0, its):
  item = etree.parse('item.xml', parser=etree.XMLParser(remove_comments=True))
  items.append(item.getroot()

etree.tostring(doc, encoding='UTF-8', xml_declaration=True, pretty_print=True)

I've noticed one thing though, which isn't an error per se but is rather visible when looking at the generated XML document - the indentation is messed up exactly where the sub-tree XML has been inserted. I can fix this by using some XML formatting tool (for example Visual Studio Code's or Notepad++'s) but I'm wondering as to why this is happening.

Upvotes: 1

Views: 3268

Answers (1)

mzjn
mzjn

Reputation: 51052

Use remove_blank_text=True when creating the XML parser:

parser=etree.XMLParser(remove_blank_text=True, remove_comments=True)

This will remove all ignorable whitespace and let the subsequent pretty-printing "start from scratch".

Upvotes: 4

Related Questions