Reputation: 3818
I do have following xml generated by some http response
<?xml version="1.0" encoding="UTF-8"?>
<Response rid="1000" status="succeeded" moreData="false">
<Results completed="true" total="25" matched="5" processed="25">
<Resource type="h" DisplayName="Host" name="tango">
<Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
<PerfData attrId="cpuUsage" attrName="Usage">
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="36.00"/>
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="86.00"/>
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="29.00"/>
</PerfData>
<Resource type="vm" DisplayName="VM" name="charlie" baseHost="tango">
<Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
<PerfData attrId="cpuUsage" attrName="Usage">
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="6.00"/>
</PerfData>
</Resource>
</Resource>
</Result>
</Response>
If you look at this carefully - Outer has one more same tag inside that
So high level xml structure is as below
<Resource>
<Resource>
</Resource>
</Resource>
Python ElementTree can parse only outer xml ... Below is my code
pattern = re.compile(r'(<Response.*?</Response>)',
re.VERBOSE | re.MULTILINE)
for match in pattern.finditer(data):
contents = match.group(1)
responses = xml.fromstring(contents)
for results in responses:
result = results.tag
for resources in results:
resource = resources.tag
temp = {}
temp = resources.attrib
print temp
This shows following output (temp)
{'typeDisplayName': 'Host', 'type': 'h', 'name': 'tango'}
How can I fetch inner attributes?
Upvotes: 0
Views: 296
Reputation: 10961
Don't parse xml with regular expressions! That won't work, use some xml parsing library instead, lxml for instance:
edit: the code example now fetch top resources only, the loop over them and try to fetch "sub resources", this was made after OP request in comment
from lxml import etree
content = '''
YOUR XML HERE
'''
root = etree.fromstring(content)
# search for all "top level" resources
resources = root.xpath("//Resource[not(ancestor::Resource)]")
for resource in resources:
# copy resource attributes in a dict
mashup = dict(resource.attrib)
# find child resource elements
subresources = resource.xpath("./Resource")
# if we find only one resource, add it to the mashup
if len(subresources) == 1:
mashup['resource'] = dict(subresources[0].attrib)
# else... not idea what the OP wants...
print mashup
That will output:
{'resource': {'DisplayName': 'VM', 'type': 'vm', 'name': 'charlie', 'baseHost': 'tango'}, 'DisplayName': 'Host', 'type': 'h', 'name': 'tango'}
Upvotes: 2