shrishinde
shrishinde

Reputation: 3405

Python xml parsing using lxml

I have xml file like below.

<sbe:messageSchema xmlns:sbe="http://www.fixprotocol.org/ns/simple/1.0"
                   description="something"
                   byteOrder="littleEndian">
  <sbe:message name="DummyMsg" id="99" description="Placeholder message.  Uses otherwise unused enums and composites so sbe compiles them.">
    <field name="msgType" id="1" type="MsgType" />
    <field name="minimumSbeSchemaVersion" id="2" type="MinimumSbeSchemaVersion"/>
  </sbe:message>

</sbe:messageSchema>

xml file has multiple sbe:messageSchema records.

Tried using lxml.etree. If I do for child in root then I am getting description, byteorder etc but not sbe:messageSchema. Also if I try root.iterfind('./sbe:message') then I get error something like sbe not found.

I would like to get sbe:messageSchema and its fields. Please help.

Upvotes: 1

Views: 106

Answers (1)

ChuckB
ChuckB

Reputation: 898

The quick answer is that you need to supply a namespace map to your .iterfind() call as an optional second argument. A namespace map is simply a dictionary where the key is a namespace prefix and the value is a namespace URL.

Presumably you did something like this:

doc = etree.parse(open('messages.xml'))
root = doc.getroot()
for child in root.iterfind('./sbe:message'):
    print child

Because namespace prefixes are abitrary and can be remapped to different URLs at any time, you need to tell lxml explicitly what namespace URL the prefix sbe is associated with. If you want to see the namespace declarations in the root element, do this:

root.nsmap

and you'll see this:

{'sbe': 'http://www.fixprotocol.org/ns/simple/1.0'}

So to simply reuse the namespace declarations from the root element:

doc = etree.parse(open('messages.xml'))
root = doc.getroot()
for child in root.iterfind('./sbe:message', root.nsmap):
    print child

In the case of your example XML data, you'll print one sbe:message element.

<Element {http://www.fixprotocol.org/ns/simple/1.0}message at 0x7f8175b92ef0>

I also sense some confusion about basic XML concepts. Be sure you understand the basic constraints of well-formedness, and the difference between the attributes of an element and its child elements. XML has no 'fields', only elements, attributes, comments, processing instructions, text nodes, and XML declarations. Namespaces are unfortunately complicated, but essential in many cases. Get a basic understanding of them.

Upvotes: 1

Related Questions