Jonas Liddell
Jonas Liddell

Reputation: 31

Parseing xml and html in same project

I want to parse in one project XML and HTML at the same time.

I tried this:

from xml.etree import ElementTree as ET

tree = ET.parse(fpath)
html_file = ET.parse(htmlpath)

and got this error:

Traceback (most recent call last): File "C:.py", line 55, in html_file = ET.parse("htmlpath") File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 1197, in parse tree.parse(source, parser) File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 598, in parse self._root = parser._parse_whole(source) xml.etree.ElementTree.ParseError: undefined entity  : line 690, column 78

Upvotes: 0

Views: 115

Answers (1)

Guido U. Draheim
Guido U. Draheim

Reputation: 3271

The nbsp is a standard html5 entity. It may help to convert those to their unicode characters before running the xml parser. In python3.4+ you can use html.unescape for that.

from html import escape, unescape
textXML = re.sub("\\&\\w+\\;", lambda x: escape(unescape(x.group(0))), text)

Upvotes: 0

Related Questions