Dhakchianandan
Dhakchianandan

Reputation: 351

Ignore mismatched tag in xml.etree.ElementTree.XMLParser Python

Is there is any way to ignore mismatched tags in Python xml.etree.ElementTree.XMLParser?

Upvotes: 4

Views: 7341

Answers (1)

mzjn
mzjn

Reputation: 51012

If there are mismatched tags, then the input that you are processing is not XML by definition (since it is not well-formed). There is no way to "ignore" mismatched tags with ElementTree.


The XMLParser class in the lxml library has a recover constructor argument (see http://lxml.de/api/lxml.etree.XMLParser-class.html). When recover=True, lxml will try to fix ill-formed input. Example:

from lxml import etree

BADINPUT = """
<root> 
  <foo>ABC</bar> 
  <baz>DEF</baz> 
</root>"""

parser = etree.XMLParser(recover=True)
root = etree.fromstring(BADINPUT, parser)
print etree.tostring(root)

Output (the bad </bar> end tag has been changed to </foo>):

<root> 
  <foo>ABC</foo>
  <baz>DEF</baz> 
</root>

Upvotes: 7

Related Questions