Reputation: 5775
I have XML with huge nested structure. Like this one
<root>
<node1>
<subnode1>
<name1>text1</name1>
</subnode1>
</node1>
<node2>
<subnode2>
<name2>text2</name2>
</subnode2>
</node2>
</root>
I want convert it to
<root>
<node1>
<name1>text1</name1>
</node1>
<node2>
<name2>text2</name2>
</node2>
</root>
I was tried with following steps
from xml.etree import ElementTree as et
tr = etree.parse(path)
root = tr.getroot()
for node in root.getchildren():
for element in node.iter():
if (element.text is not None):
node.extend(element)
I also tried with node.append(element)
but it also does not work it adds element in end and i got infinity loop.
Any helps be appreciated.
Upvotes: 0
Views: 743
Reputation: 1667
A few points to mention here:
Firstly, your test element.text is not None
always returns True
if you parse your XML file as given above using xml.etree.Elementree
since at the end of each node, there is a new line character, hence, the text in each supposedly not-having-text node always have \n
character. An alternative is to use lxml.etree.parse
with a lxml.etree.XMLParser
that ignore the blank text as below.
Secondly, it's not good to append to a tree while reading through it. The same reason for why this code will give infinite loop:
>>> a = [1,2,3,4]
>>> for k in a:
a.append(5)
You could see @Alex Martelli answer for this question here: Modifying list while iterating regarding the issue.
Hence, you should make a buffer XML tree and build it accordingly rather than modifying your tree while traversing it.
from xml.etree import ElementTree as et
import pdb;
from lxml import etree
p = etree.XMLParser(remove_blank_text=True)
path = 'test.xml'
tr = et.parse(path, parser = p)
root = tr.getroot()
buffer = et.Element(root.tag);
for node in root.getchildren():
bnode = et.Element(node.tag)
for element in node.iter():
#pdb.set_trace()
if (element.text is not None):
bnode.append(element)
#node.extend(element)
buffer.append(bnode)
et.dump(buffer)
Sample run and results:
Chip chip@ 01:01:53@ ~: python stackoverflow.py
<root><node1><name1>text1</name1></node1><node2><name2>text2</name2></node2></root>
NOTE: you can always try to print a pretty XML tree using lxml
package in python following tutorials here: Pretty printing XML in Python since the tree I printed out is rather horrible to read by naked eyes.
Upvotes: 2