hoangvu68
hoangvu68

Reputation: 865

Python: xml.etree.ElementTree.tostring error

I am using xml.etree.ElementTree.tostring() to convert from etree element to string. But sometime I have problem with it:

xpath = "..."
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
result = tree.xpath(xpath)
xml.etree.ElementTree.tostring(result[0], encoding='utf-8')

Error is:

File "../abc.py", line 165, in abc
    results.append(xml.etree.ElementTree.tostring(result[0], encoding='utf-8'))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1127, in tostring
    ElementTree(element).write(file, encoding, method=method)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 818, in write
    self._root, encoding, default_namespace
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 887, in _namespaces
    _raise_serialization_error(tag)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1053, in _raise_serialization_error
    "cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize <built-in function Comment> (type builtin_function_or_method)

How can I resolve it?

Upvotes: 2

Views: 4100

Answers (1)

root
root

Reputation: 80346

It looks like result[0] is a comment, you may want to skip. Something like this should do:

etree.HTMLParser(remove_comments=True)

From the docs:

ElementTree ignores comments and processing instructions when parsing XML, while etree will read them in and treat them as Comment or ProcessingInstruction elements respectively. This is especially visible where comments are found inside text content, which is then split by the Comment element.

You can disable this behaviour by passing the boolean remove_comments and/or remove_pis keyword arguments to the parser you use. For convenience and to support portable code, you can also use the etree.ETCompatXMLParser instead of the default etree.XMLParser. It tries to provide a default setup that is as close to the ElementTree parser as possible.

Upvotes: 2

Related Questions