Sam_slo
Sam_slo

Reputation: 141

Python3 cannot get XML element value with lxml.etree.find

I'm trying to handle a POST response where I get an XML. The result is saved as bytes b'':

<?xml version="1.0" encoding="utf-8"?>
<result xmlns="http://something.com/Schema/V2/Result">
    <success>false</success>
    <returnType>ERROR</returnType>
    <errors>
        <error>
            <message>Invalid signature</message>
            <code>3002</code>
        </error>
    </errors>
</result>

Code:

from lxml import etree as et

root_node = et.fromstring(response.content)
print('{}'.format(root_node.find('.//returnType')))
return_type = root_node.find('.//returnType').text

The print statement return None, so the find().text throws exception.

If I iterate trough the childs with for I get the node but with namespace that I cannot handle.

for tag in root_node.getchildren():
    print(tag)

<Element {http://something.com/Schema/V2/Result}returnType at 0x7f6c95542648>

How can I get the XML nodes and their values? I've tried the stackoverflow answers for kinda similar problems but nothing works. Tried using regex to remove the schema and adding a prefix to the NS.

EDIT: tried the answer and getting the standard error that I cannot get the nodes.

    /usr/bin/python3 /home/samoa/Scripts/Python/lxml_test.py
Traceback (most recent call last):
  File "/home/samoa/Scripts/Python/lxml_test.py", line 17, in <module>
    print(root.find("returnType", root.nsmap).text)
  File "src/lxml/lxml.etree.pyx", line 1537, in lxml.etree._Element.find (src/lxml/lxml.etree.c:58520)
  File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 288, in find
    it = iterfind(elem, path, namespaces)
  File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 277, in iterfind
    selector = _build_path_iterator(path, namespaces)
  File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 234, in _build_path_iterator
    raise ValueError("empty namespace prefix is not supported in ElementPath")
ValueError: empty namespace prefix is not supported in ElementPath

Upvotes: 0

Views: 744

Answers (1)

Georges Martin
Georges Martin

Reputation: 1208

Pass the namespace map to the find() method. As http://something.com/Schema/V2/Result is the default namespace in your document, that's all you have to do:

return_type_element = root_node.find('.//returnType', root_node.nsmap)

or:

return_type_element = root_node.find('returnType', root_node.nsmap)

Moreover, the str.format() in:

print('{}'.format(root_node.find('.//returnType')))

is unnecessary and can be shortened to:

return_type_element = root_node.find('returnType', root_node.nsmap)
print(return_type_element)

# <Element {http://something.com/Schema/V2/Result}returnType at 0x107c28bc0>

If, however, you want to print return_type_element as XML, use the lxml.etree.tostring() function:

print(ET.tostring(return_type_element))

# b'<returnType xmlns="http://something.com/Schema/V2/Result">ERROR</returnType>\n    '

Thus, your return_type can be obtained through:

return_type = root_node.find('returnType', root_node.nsmap).text

My test script is:

#!/usr/bin/env python3
from lxml import etree as ET

content = b'''<?xml version="1.0" encoding="utf-8"?>
<result xmlns="http://something.com/Schema/V2/Result">
    <success>false</success>
    <returnType>ERROR</returnType>
    <errors>
        <error>
            <message>Invalid signature</message>
            <code>3002</code>
        </error>
    </errors>
</result>
'''

root = ET.fromstring(content)
emptyns = root.nsmap[None]
print(root.find("{%s}returnType" % (emptyns)).text)

# step-by-step

root = ET.fromstring(content)
print("Root element: %s" % (root))

emptyns = root.nsmap[None]
print("Empty namespace: %s" % (emptyns))

return_type_element = root.find("{%s}returnType" % (emptyns))
print("<returnType> element: %s" % (return_type_element))
print("<returnType> element as XML: %s" % (ET.tostring(return_type_element)))

return_type = return_type_element.text
print('<returnType> text: %s' % (return_type))

# children

for element in root.getchildren():
    print("Element tag (with namespace): %s" % (element.tag))
    _, _, tag = element.tag.rpartition("}")
    print("Element tag (without namespace): %s" % (tag))

Its result is:

ERROR
Root element: <Element {http://something.com/Schema/V2/Result}result at 0x102f63188>
Empty namespace: http://something.com/Schema/V2/Result
<returnType> element: <Element {http://something.com/Schema/V2/Result}returnType at 0x102f630c8>
<returnType> element as XML: b'<returnType xmlns="http://something.com/Schema/V2/Result">ERROR</returnType>\n    '
<returnType> text: ERROR
Element tag (with namespace): {http://something.com/Schema/V2/Result}success
Element tag (without namespace): success
Element tag (with namespace): {http://something.com/Schema/V2/Result}returnType
Element tag (without namespace): returnType
Element tag (with namespace): {http://something.com/Schema/V2/Result}errors
Element tag (without namespace): errors

Upvotes: 2

Related Questions