Reputation: 351
Is there is any way to ignore mismatched tags in Python xml.etree.ElementTree.XMLParser?
Upvotes: 4
Views: 7341
Reputation: 51012
If there are mismatched tags, then the input that you are processing is not XML by definition (since it is not well-formed). There is no way to "ignore" mismatched tags with ElementTree.
The XMLParser
class in the lxml library has a recover
constructor argument (see http://lxml.de/api/lxml.etree.XMLParser-class.html). When recover=True
, lxml will try to fix ill-formed input. Example:
from lxml import etree
BADINPUT = """
<root>
<foo>ABC</bar>
<baz>DEF</baz>
</root>"""
parser = etree.XMLParser(recover=True)
root = etree.fromstring(BADINPUT, parser)
print etree.tostring(root)
Output (the bad </bar>
end tag has been changed to </foo>
):
<root>
<foo>ABC</foo>
<baz>DEF</baz>
</root>
Upvotes: 7