Reputation: 53
I'm a bit new to Python and to the XML world. I desperately need your help, I'm running out of time to finish this project! Basically I have a xml file that I need to elaborate before importing it into Excel. My XML is structured as follows (very small extract):
<?xml version="1.0" encoding="UTF-8"?>
<Application>
<first/>
<second>
<third/>
<third/>
<third/>
</second>
</Application>
What I need to do is to parse the xml file (elementtree or lxml) and to eliminate <first/>
and <second/>
, in order to get something like this:
<?xml version="1.0" encoding="UTF-8"?>
<Application>
<third/>
<third/>
<third/>
</Application>
I have already read and tried basically all the related questions I could find, but all I managed to achieve was to eliminate the whole <first/>
element.
I'm using Python 3.6.2, standard libraries are preferred (lxml, elementtree).
Thanks in advance for any help you can give!
Upvotes: 1
Views: 483
Reputation: 3787
Ultimate task is to delete the parent in the given example.(Application - root, first,seond - node, third-inner_nodes) )
1) load your xml(and find the node you consider here as 'Application')
2) get the list of inner_nodes(tree->nodes->inner_nodes) for your tree
3) get all the inner_nodes(nodes with name 'third' here)
4) remove the immediate children of root - 'Applicaiton'
5) Append all the inner_nodes to your root!
yourxmlfile.txt
<?xml version="1.0" encoding="UTF-8"?>\n<Application>\n <first/>\n <second>\n <third/>\n <third/>\n <third/>\n </second>\n</Application>
And you can read your xml file withe tree.parse()
>>> import xml.etree.ElementTree as etree
>>> root=etree.parse('yourxmlfile.xml')
>>> etree.tostring(root)
b'<Application>\n <first />\n <second>\n <third />\n <third />\n <third />\n </second>\n</Application>'
>>> inner_nodes=[node.getchildren() for node in root.getchildren()]
>>> print(inner_nodes)
[[], [<Element 'third' at 0x10c272818>, <Element 'third' at 0x10c2727c8>, <Element 'third' at 0x10c272778>]]
>>> for node in root.getchildren():root.remove(node)
...
>>> etree.tostring(root)
b'<Application>\n </Application>'
>>> [[root.append(c) for c in child] for child in filter(None,inner_nodes)]
[[None, None, None]]
>>> etree.tostring(root)
b'<Application>\n <third />\n <third />\n <third />\n </Application>'
Upvotes: 1