Reputation: 865
I am using xml.etree.ElementTree.tostring() to convert from etree element to string. But sometime I have problem with it:
xpath = "..."
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
result = tree.xpath(xpath)
xml.etree.ElementTree.tostring(result[0], encoding='utf-8')
Error is:
File "../abc.py", line 165, in abc
results.append(xml.etree.ElementTree.tostring(result[0], encoding='utf-8'))
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1127, in tostring
ElementTree(element).write(file, encoding, method=method)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 818, in write
self._root, encoding, default_namespace
File "C:\Python27\lib\xml\etree\ElementTree.py", line 887, in _namespaces
_raise_serialization_error(tag)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1053, in _raise_serialization_error
"cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize <built-in function Comment> (type builtin_function_or_method)
How can I resolve it?
Upvotes: 2
Views: 4100
Reputation: 80346
It looks like result[0]
is a comment, you may want to skip. Something like this should do:
etree.HTMLParser(remove_comments=True)
From the docs:
ElementTree ignores comments and processing instructions when parsing XML, while etree will read them in and treat them as Comment or ProcessingInstruction elements respectively. This is especially visible where comments are found inside text content, which is then split by the Comment element.
You can disable this behaviour by passing the boolean remove_comments and/or remove_pis keyword arguments to the parser you use. For convenience and to support portable code, you can also use the etree.ETCompatXMLParser instead of the default etree.XMLParser. It tries to provide a default setup that is as close to the ElementTree parser as possible.
Upvotes: 2