Reputation: 141
I'm trying to handle a POST response where I get an XML. The result is saved as bytes b'':
<?xml version="1.0" encoding="utf-8"?>
<result xmlns="http://something.com/Schema/V2/Result">
<success>false</success>
<returnType>ERROR</returnType>
<errors>
<error>
<message>Invalid signature</message>
<code>3002</code>
</error>
</errors>
</result>
Code:
from lxml import etree as et
root_node = et.fromstring(response.content)
print('{}'.format(root_node.find('.//returnType')))
return_type = root_node.find('.//returnType').text
The print statement return None, so the find().text throws exception.
If I iterate trough the childs with for I get the node but with namespace that I cannot handle.
for tag in root_node.getchildren():
print(tag)
<Element {http://something.com/Schema/V2/Result}returnType at 0x7f6c95542648>
How can I get the XML nodes and their values? I've tried the stackoverflow answers for kinda similar problems but nothing works. Tried using regex to remove the schema and adding a prefix to the NS.
EDIT: tried the answer and getting the standard error that I cannot get the nodes.
/usr/bin/python3 /home/samoa/Scripts/Python/lxml_test.py
Traceback (most recent call last):
File "/home/samoa/Scripts/Python/lxml_test.py", line 17, in <module>
print(root.find("returnType", root.nsmap).text)
File "src/lxml/lxml.etree.pyx", line 1537, in lxml.etree._Element.find (src/lxml/lxml.etree.c:58520)
File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 288, in find
it = iterfind(elem, path, namespaces)
File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 277, in iterfind
selector = _build_path_iterator(path, namespaces)
File "/usr/local/lib/python3.6/dist-packages/lxml/_elementpath.py", line 234, in _build_path_iterator
raise ValueError("empty namespace prefix is not supported in ElementPath")
ValueError: empty namespace prefix is not supported in ElementPath
Upvotes: 0
Views: 744
Reputation: 1208
Pass the namespace map to the find()
method. As http://something.com/Schema/V2/Result
is the default namespace in your document, that's all you have to do:
return_type_element = root_node.find('.//returnType', root_node.nsmap)
or:
return_type_element = root_node.find('returnType', root_node.nsmap)
Moreover, the str.format()
in:
print('{}'.format(root_node.find('.//returnType')))
is unnecessary and can be shortened to:
return_type_element = root_node.find('returnType', root_node.nsmap)
print(return_type_element)
# <Element {http://something.com/Schema/V2/Result}returnType at 0x107c28bc0>
If, however, you want to print return_type_element
as XML, use the lxml.etree.tostring()
function:
print(ET.tostring(return_type_element))
# b'<returnType xmlns="http://something.com/Schema/V2/Result">ERROR</returnType>\n '
Thus, your return_type
can be obtained through:
return_type = root_node.find('returnType', root_node.nsmap).text
My test script is:
#!/usr/bin/env python3
from lxml import etree as ET
content = b'''<?xml version="1.0" encoding="utf-8"?>
<result xmlns="http://something.com/Schema/V2/Result">
<success>false</success>
<returnType>ERROR</returnType>
<errors>
<error>
<message>Invalid signature</message>
<code>3002</code>
</error>
</errors>
</result>
'''
root = ET.fromstring(content)
emptyns = root.nsmap[None]
print(root.find("{%s}returnType" % (emptyns)).text)
# step-by-step
root = ET.fromstring(content)
print("Root element: %s" % (root))
emptyns = root.nsmap[None]
print("Empty namespace: %s" % (emptyns))
return_type_element = root.find("{%s}returnType" % (emptyns))
print("<returnType> element: %s" % (return_type_element))
print("<returnType> element as XML: %s" % (ET.tostring(return_type_element)))
return_type = return_type_element.text
print('<returnType> text: %s' % (return_type))
# children
for element in root.getchildren():
print("Element tag (with namespace): %s" % (element.tag))
_, _, tag = element.tag.rpartition("}")
print("Element tag (without namespace): %s" % (tag))
Its result is:
ERROR
Root element: <Element {http://something.com/Schema/V2/Result}result at 0x102f63188>
Empty namespace: http://something.com/Schema/V2/Result
<returnType> element: <Element {http://something.com/Schema/V2/Result}returnType at 0x102f630c8>
<returnType> element as XML: b'<returnType xmlns="http://something.com/Schema/V2/Result">ERROR</returnType>\n '
<returnType> text: ERROR
Element tag (with namespace): {http://something.com/Schema/V2/Result}success
Element tag (without namespace): success
Element tag (with namespace): {http://something.com/Schema/V2/Result}returnType
Element tag (without namespace): returnType
Element tag (with namespace): {http://something.com/Schema/V2/Result}errors
Element tag (without namespace): errors
Upvotes: 2