Ravitej Rai
Ravitej Rai

Reputation: 55

Parsing xml in python to get all child elements

I have parsed an XML file to get all its elements. I am getting the following output

[<Element '{urn:mitel:params:xml:ns:yang:vld}vld-list' at 0x0000000003059188>, <Element '{urn:mitel:params:xml:ns:yang:vld}vl-id' at 0x00000000030689F8>, <Element '{urn:mitel:params:xml:ns:yang:vld}descriptor-version' at 0x0000000003068A48>]

I need to select the value between } and ' only for each element of the list.

This is my Code till now :

import xml.etree.ElementTree as ET  
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')  
root = tree.getroot()

# all items
print('\nAll item data:')
for elem in root:  
    all_descendants = list(elem.iter())
    print(all_descendants)

How can i achieve this ?

Upvotes: 0

Views: 13002

Answers (1)

Adrian W
Adrian W

Reputation: 5026

The text in {} is the namespace part of the qualified name (QName) of the XML element. AFAIK there is no method in ElementTree to return only the local name. So, you have to either

  • extract the local part of the name with string handling, as already proposed in a comment to your question,
  • use lxml.etree instead of xml.etree.ElementTree and apply xpath('local-name()') on each element,
  • or provide an XML source without namespace. You can strip the namespace with XSLT.

So, given this XML input:

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns="urn:mitel:params:xml:ns:yang:vld">
    <bar>
        <baz x="1"/>
        <yet>
            <more>
                <nested/>
            </more>
        </yet>
    </bar>
    <bar/>
</foo>

You can print a list of the local names only with this variation of your program:

import xml.etree.ElementTree as ET  
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')  
root = tree.getroot()

# all items
print('\nAll item data:')
for elem in root:
    all_descendants = [e.tag.split('}', 1)[1] for e in elem.iter()]
    print(all_descendants)

Output:

['bar', 'baz', 'yet', 'more', 'nested']
['bar']

The version with lxml.etree and xpath('local-name()') looks like this:

import lxml.etree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')  
root = tree.getroot()

# all items
print('\nAll item data:')
for elem in root:
    all_descendants = [e.xpath('local-name()') for e in elem.iter()]
    print(all_descendants)

The output is the same as with the string handling version.


For stripping the namespace completely from your input, you can apply this XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="*">
        <xsl:element name="{local-name()}">
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

Then your original program outputs:

[<Element 'bar' at 0x04583B40>, <Element 'baz' at 0x04583B70>, <Element 'yet' at 0x04583BD0>, <Element 'more' at 0x04583C30>, <Element 'nested' at 0x04583C90>]
[<Element 'bar' at 0x04583CC0>]

Now the elements themselves do not bear a namespace. So, you don't have to strip it anymore.

You can apply the XSLT with with xsltproc, then you don't need to change your program. Alternatively, you can apply XSLT in python, but this also requires you to use lxml.etree. So, the last variation of your program looks like this:

import lxml.etree as ET

tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')  
xslt = ET.parse('stripns.xslt')
transform = ET.XSLT(xslt)
tree = transform(tree)

root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
    all_descendants = list(elem.iter())
    print(all_descendants)

Upvotes: 1

Related Questions