nick_gabpe
nick_gabpe

Reputation: 5775

How to make nested xml structure flat with python

I have XML with huge nested structure. Like this one

<root>
 <node1>
  <subnode1>
    <name1>text1</name1>
  </subnode1>
 </node1>
 <node2>
  <subnode2>
     <name2>text2</name2>
  </subnode2>
 </node2>
</root>

I want convert it to

<root>
  <node1>
    <name1>text1</name1>
  </node1>
  <node2>
    <name2>text2</name2>
  </node2>
</root>

I was tried with following steps

from xml.etree import ElementTree as et

tr = etree.parse(path)
root = tr.getroot()

for node in root.getchildren():
  for element in node.iter():
    if (element.text is not None):
      node.extend(element) 

I also tried with node.append(element) but it also does not work it adds element in end and i got infinity loop. Any helps be appreciated.

Upvotes: 0

Views: 743

Answers (1)

TuanDT
TuanDT

Reputation: 1667

A few points to mention here:

Firstly, your test element.text is not None always returns True if you parse your XML file as given above using xml.etree.Elementree since at the end of each node, there is a new line character, hence, the text in each supposedly not-having-text node always have \n character. An alternative is to use lxml.etree.parse with a lxml.etree.XMLParser that ignore the blank text as below.

Secondly, it's not good to append to a tree while reading through it. The same reason for why this code will give infinite loop:

>>> a = [1,2,3,4]
>>> for k in a:
        a.append(5)

You could see @Alex Martelli answer for this question here: Modifying list while iterating regarding the issue.

Hence, you should make a buffer XML tree and build it accordingly rather than modifying your tree while traversing it.

from xml.etree import ElementTree as et
import pdb;

from lxml import etree

p = etree.XMLParser(remove_blank_text=True)
path = 'test.xml'
tr = et.parse(path, parser = p)
root = tr.getroot()

buffer = et.Element(root.tag);

for node in root.getchildren():
    bnode = et.Element(node.tag)
    for element in node.iter():
        #pdb.set_trace()
        if (element.text is not None):
            bnode.append(element)
            #node.extend(element)
    buffer.append(bnode)

et.dump(buffer)

Sample run and results:

Chip chip@ 01:01:53@ ~: python stackoverflow.py
<root><node1><name1>text1</name1></node1><node2><name2>text2</name2></node2></root>

NOTE: you can always try to print a pretty XML tree using lxml package in python following tutorials here: Pretty printing XML in Python since the tree I printed out is rather horrible to read by naked eyes.

Upvotes: 2

Related Questions