Luke
Luke

Reputation: 53

How to remove element node but keep its childs in a XML file using Python?

I'm a bit new to Python and to the XML world. I desperately need your help, I'm running out of time to finish this project! Basically I have a xml file that I need to elaborate before importing it into Excel. My XML is structured as follows (very small extract):

<?xml version="1.0" encoding="UTF-8"?>
<Application>
    <first/>
    <second>
        <third/>
        <third/>
        <third/>
    </second>
</Application>

What I need to do is to parse the xml file (elementtree or lxml) and to eliminate <first/> and <second/>, in order to get something like this:

<?xml version="1.0" encoding="UTF-8"?>
<Application>
        <third/>
        <third/>
        <third/>      
</Application>

I have already read and tried basically all the related questions I could find, but all I managed to achieve was to eliminate the whole <first/> element.

I'm using Python 3.6.2, standard libraries are preferred (lxml, elementtree).

Thanks in advance for any help you can give!

Upvotes: 1

Views: 483

Answers (1)

Keerthana Prabhakaran
Keerthana Prabhakaran

Reputation: 3787

Ultimate task is to delete the parent in the given example.(Application - root, first,seond - node, third-inner_nodes) )

1) load your xml(and find the node you consider here as 'Application')

2) get the list of inner_nodes(tree->nodes->inner_nodes) for your tree

3) get all the inner_nodes(nodes with name 'third' here)

4) remove the immediate children of root - 'Applicaiton'

5) Append all the inner_nodes to your root!

yourxmlfile.txt

<?xml version="1.0" encoding="UTF-8"?>\n<Application>\n    <first/>\n    <second>\n        <third/>\n        <third/>\n        <third/>\n    </second>\n</Application>

And you can read your xml file withe tree.parse()

>>> import xml.etree.ElementTree as etree
>>> root=etree.parse('yourxmlfile.xml')
>>> etree.tostring(root)
b'<Application>\n    <first />\n    <second>\n        <third />\n        <third />\n        <third />\n    </second>\n</Application>'
>>> inner_nodes=[node.getchildren() for node in root.getchildren()]
>>> print(inner_nodes)
[[], [<Element 'third' at 0x10c272818>, <Element 'third' at 0x10c2727c8>, <Element 'third' at 0x10c272778>]]
>>> for node in root.getchildren():root.remove(node)
... 
>>> etree.tostring(root)
b'<Application>\n    </Application>'
>>> [[root.append(c) for c in child] for child in filter(None,inner_nodes)]
[[None, None, None]]
>>> etree.tostring(root)
b'<Application>\n    <third />\n        <third />\n        <third />\n    </Application>'

Upvotes: 1

Related Questions