Parsing nested XML with ElementTree

Question

I have the following XML format, and I want to pull out the values for name, region, and status using python's xml.etree.ElementTree module.

However, my attempt to get this information has been unsuccessful so far.


    
        uuid:asdfadsfasdf123123
        
        
            
                instancename
                US
                Active
            
        
    
    
        uuid:asdfadsfasdf234234
        
        
            
                instancename2
                US2
                Active

My code attempt:

NAMESPACE = '{http://www.w3.org/2005/Atom}'
root = et.fromstring(XML_STRING)
entry_root = root.findall('{0}entry'.format(NAMESPACE))
for child in entry_root:
    content_node = child.find('{0}content'.format(NAMESPACE))
    for content in content_node:
        for desc in content.iter():
            print desc.tag
            name = desc.find('{0}Name'.format(NAMESPACE))
            print name

desc.tag is giving me the nodes I want to access, but name is returning None. Any ideas what's wrong with my code?

Output of desc.tag:

{http://schemas.microsoft.com/netservices/2010/10/servicebus/connect}Name
{http://schemas.microsoft.com/netservices/2010/10/servicebus/connect}Region
{http://schemas.microsoft.com/netservices/2010/10/servicebus/connect}Status

gtlambert · Accepted Answer

You can use lxml.etree along with default namespace mapping to parse the XML as follows:

content = '''

    
        uuid:asdfadsfasdf123123
        
        
            
                instancename
                US
                Active
            
        
    
    
        uuid:asdfadsfasdf234234
        
        
            
                instancename2
                US2
                Active
            
        
    
'''

from lxml import etree

tree = etree.XML(content)
ns = {'default': 'http://schemas.microsoft.com/netservices/2010/10/servicebus/connect'}

names = tree.xpath('//default:Name/text()', namespaces=ns)
regions = tree.xpath('//default:Region/text()', namespaces=ns)
statuses = tree.xpath('//default:Status/text()', namespaces=ns)

print(names)
print(regions)
print(statuses)

Output

['instancename', 'instancename2']
['US', 'US2']
['Active', 'Active']

This XPath/namespace functionality can be adapted to output the data in any format you require.

Parsing nested XML with ElementTree

Answers (2)

Related Questions