Mawg
Mawg

Reputation: 40140

Problems parsing an XML file with xml.etree.ElementTree

I have to parse xml files which contain entries like

<error code="UnknownDevice">
    <description />
</error>

which are defined elsewhere as

<group name="error definitions">
     <errordef id="0x11" name="UnknownDevice">
        <description>Indicated device is unknown</description>
     </errordef>
     ...
</group>

given

import xml.etree.ElementTree as ET

parser = ET.XMLParser()
parser.parser.UseForeignDTD(True)

tree = ET.parse(inputFileName, parser=parser)
root = tree.getroot()

How can I get those values for errorDef? I mean the value of id and of description?

How can I search for & extract those values, using unknownDevice?


[Update] The error groups have differing names, but always of the format "XXX error definitions", "YYY error definitions", etc

Further, they seem to be nested at different depths in different documents.

Given the error's title, e.g "unknownDevice", how can I search everything under the root to get the corresponding id and description values?

Can I go directly to them, using e.g "unknownDevice", or do I have to search first for the error groups?

Upvotes: 2

Views: 2099

Answers (4)

NITIN SRIVASTAV
NITIN SRIVASTAV

Reputation: 13

You want to get the value of description and id for every errordef element, you could do this:

import xml.etree.ElementTree as ET
dict01={}
tree=ET.parse('grpError.xml')
root=tree.getroot()
print (root)
docExe=root.findall('errordef') #Element reference
dict01=docExe[0].attrib #Store Attributes in dictionary
print (dict01)
print (dict01['id']) #Attributes of an element
print (dict01['name']) #Attributes of an element
print (docExe[0].find('description').text) #Child Elements inside parent Element

Output is:

<Element 'group' at 0x000001A582EDB4A8>
{'id': '0x11', 'name': 'UnknownDevice'}
0x11
UnknownDevice
Indicated device is unknown

Upvotes: 1

alecxe
alecxe

Reputation: 473753

First, parse the error definitions into a dictionary:

errors = {
    errordef.attrib["name"]: {"id": errordef.attrib.get("id"), "description": errordef.findtext("description")}
    for errordef in root.xpath(".//group[@name='error definitions']/errordef[@name]")
}

Then, every time you need to get the error id and description, look it up by code:

error_code = root.find("error").attrib["code"]
print(errors.get(error_code, "Unknown Error"))

Note that the xpath() method is coming from lxml.etree. If you are using xml.etree.ElementTree, replace xpath() with findall() - the limited XPath support provided by xml.etree.ElementTree is enough for the provided expressions.

Upvotes: 1

larsks
larsks

Reputation: 311238

If you have this:

<group name="error definitions">
     <errordef id="0x11" name="UnknownDevice">
        <description>Indicated device is unknown</description>
     </errordef>
     ...
</group>

And you want to get the value of description and id for every errordef element, you could do this:

for err in tree.xpath('//errordef'):
    print err.get('id'), err.find('description').text

Which would give you something like:

0x11 Indicated device is unknown

Upvotes: 1

dkx22
dkx22

Reputation: 1133

You need a selector, though I'm not really sure you can do this with lxml. It has css selector but I don't find anything to select an "id" in the doc... I only used lxml to remove/add stuff to html. Maybe take a look at scrapy? Using scrapy it would look like this when you loaded your html.

response.xpath('//div[@id="0x11"]/text()').extract()

Upvotes: 0

Related Questions