XML Parsing issue in python using xml.etree.ElementTree

Question

I do have following xml generated by some http response

If you look at this carefully - Outer has one more same tag inside that

So high level xml structure is as below

Python ElementTree can parse only outer xml ... Below is my code

pattern = re.compile(r'()',
                     re.VERBOSE | re.MULTILINE)

for match in pattern.finditer(data):
    contents = match.group(1)
    responses = xml.fromstring(contents)

    for results in responses:
        result = results.tag

        for resources in results:
            resource = resources.tag
            temp = {}
            temp = resources.attrib
            print temp

This shows following output (temp)

{'typeDisplayName': 'Host', 'type': 'h', 'name': 'tango'}

How can I fetch inner attributes?

Guillaume · Accepted Answer

Don't parse xml with regular expressions! That won't work, use some xml parsing library instead, lxml for instance:

edit: the code example now fetch top resources only, the loop over them and try to fetch "sub resources", this was made after OP request in comment

from lxml import etree

content = '''
YOUR XML HERE
'''

root = etree.fromstring(content)

# search for all "top level" resources
resources = root.xpath("//Resource[not(ancestor::Resource)]")
for resource in resources:
    # copy resource attributes in a dict
    mashup = dict(resource.attrib)
    # find child resource elements
    subresources = resource.xpath("./Resource")
    # if we find only one resource, add it to the mashup
    if len(subresources) == 1:
        mashup['resource'] = dict(subresources[0].attrib)
    # else... not idea what the OP wants...

    print mashup

That will output:

{'resource': {'DisplayName': 'VM', 'type': 'vm', 'name': 'charlie', 'baseHost': 'tango'}, 'DisplayName': 'Host', 'type': 'h', 'name': 'tango'}

XML Parsing issue in python using xml.etree.ElementTree

Answers (1)

Related Questions