Reputation: 40140
I have to parse xml files which contain entries like
<error code="UnknownDevice">
<description />
</error>
which are defined elsewhere as
<group name="error definitions">
<errordef id="0x11" name="UnknownDevice">
<description>Indicated device is unknown</description>
</errordef>
...
</group>
given
import xml.etree.ElementTree as ET
parser = ET.XMLParser()
parser.parser.UseForeignDTD(True)
tree = ET.parse(inputFileName, parser=parser)
root = tree.getroot()
How can I get those values for errorDef
? I mean the value of id
and of description
?
How can I search for & extract those values, using unknownDevice
?
[Update] The error groups have differing names, but always of the format "XXX error definitions", "YYY error definitions", etc
Further, they seem to be nested at different depths in different documents.
Given the error's title, e.g "unknownDevice", how can I search everything under the root to get the corresponding id
and description values?
Can I go directly to them, using e.g "unknownDevice", or do I have to search first for the error groups?
Upvotes: 2
Views: 2099
Reputation: 13
You want to get the value of description and id for every errordef element, you could do this:
import xml.etree.ElementTree as ET
dict01={}
tree=ET.parse('grpError.xml')
root=tree.getroot()
print (root)
docExe=root.findall('errordef') #Element reference
dict01=docExe[0].attrib #Store Attributes in dictionary
print (dict01)
print (dict01['id']) #Attributes of an element
print (dict01['name']) #Attributes of an element
print (docExe[0].find('description').text) #Child Elements inside parent Element
Output is:
<Element 'group' at 0x000001A582EDB4A8>
{'id': '0x11', 'name': 'UnknownDevice'}
0x11
UnknownDevice
Indicated device is unknown
Upvotes: 1
Reputation: 473753
First, parse the error definitions into a dictionary:
errors = {
errordef.attrib["name"]: {"id": errordef.attrib.get("id"), "description": errordef.findtext("description")}
for errordef in root.xpath(".//group[@name='error definitions']/errordef[@name]")
}
Then, every time you need to get the error id and description, look it up by code:
error_code = root.find("error").attrib["code"]
print(errors.get(error_code, "Unknown Error"))
Note that the xpath()
method is coming from lxml.etree
. If you are using xml.etree.ElementTree
, replace xpath()
with findall()
- the limited XPath support provided by xml.etree.ElementTree
is enough for the provided expressions.
Upvotes: 1
Reputation: 311238
If you have this:
<group name="error definitions">
<errordef id="0x11" name="UnknownDevice">
<description>Indicated device is unknown</description>
</errordef>
...
</group>
And you want to get the value of description
and id
for every errordef
element, you could do this:
for err in tree.xpath('//errordef'):
print err.get('id'), err.find('description').text
Which would give you something like:
0x11 Indicated device is unknown
Upvotes: 1
Reputation: 1133
You need a selector, though I'm not really sure you can do this with lxml. It has css selector but I don't find anything to select an "id" in the doc... I only used lxml to remove/add stuff to html. Maybe take a look at scrapy? Using scrapy it would look like this when you loaded your html.
response.xpath('//div[@id="0x11"]/text()').extract()
Upvotes: 0