Reputation: 9701
I'm dynamically generating a lot of XML data. Each document is intended as a test for specific feature(s) in a software that processes it.
A document consists of multiple different blocks. In order to keep things simple let's say the document above is what I work with:
<doc>
<attributes>
<attr/>
<attr/>
</attributes>
<items>
<item/>
<item/>
<item/>
</items>
</doc>
The number of attr
elements varies and the same applies to the number of item
ones. However the structure of each doesn't change (only the data inside).
In order to keep my Python script more readable I have stored a bunch of template XML files each of which represents a repeating element (with all of its children if any). In my main script using lxml
I create doc
, attributes
and items
. Given parameters for defining how many item
and attr
elements there need to be in the resulting XML document I simply do a loop, load the respective template, adjust the data inside and then append it to the respective parent (here attributes
and items
elements):
from lxml import etree
attrs = 2
its = 10
root = etree.Element('doc')
root.addprevious(etree.Comment('...'))
doc = etree.ElementTree(root)
attributes = etree.SubElement(root, 'Attributes')
for a in range(0, attrs):
attr = etree.parse('attribute.xml', parser=etree.XMLParser(remove_comments=True))
attributes.append(attr.getroot()
items = etree.SubElement(root, 'Items')
for i in range(0, its):
item = etree.parse('item.xml', parser=etree.XMLParser(remove_comments=True))
items.append(item.getroot()
etree.tostring(doc, encoding='UTF-8', xml_declaration=True, pretty_print=True)
I've noticed one thing though, which isn't an error per se but is rather visible when looking at the generated XML document - the indentation is messed up exactly where the sub-tree XML has been inserted. I can fix this by using some XML formatting tool (for example Visual Studio Code's or Notepad++'s) but I'm wondering as to why this is happening.
Upvotes: 1
Views: 3268
Reputation: 51052
Use remove_blank_text=True
when creating the XML parser:
parser=etree.XMLParser(remove_blank_text=True, remove_comments=True)
This will remove all ignorable whitespace and let the subsequent pretty-printing "start from scratch".
Upvotes: 4