guettli
guettli

Reputation: 27969

Get better parse error message from ElementTree

If I try to parse a broken XML the exception shows the line number. Is there a way to show the XML context?

I want to see the xml tags before and after the broken part.

Example:

import xml.etree.ElementTree as ET
tree = ET.fromstring('<a><b></a>')

Exception:

Traceback (most recent call last):
  File "tmp/foo.py", line 2, in <module>
    tree = ET.fromstring('<a><b></a>')
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8

Something like this would be nice:

ParseError:
<a><b></a>
=====^

Upvotes: 7

Views: 18301

Answers (2)

unutbu
unutbu

Reputation: 880389

You could make a helper function to do this:

import sys
import io
import itertools as IT
import xml.etree.ElementTree as ET
PY2 = sys.version_info[0] == 2
StringIO = io.BytesIO if PY2 else io.StringIO

def myfromstring(content):
    try:
        tree = ET.fromstring(content)
    except ET.ParseError as err:
        lineno, column = err.position
        line = next(IT.islice(StringIO(content), lineno))
        caret = '{:=>{}}'.format('^', column)
        err.msg = '{}\n{}\n{}'.format(err, line, caret)
        raise 
    return tree

myfromstring('<a><b></a>')

yields

xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8
<a><b></a>
=======^

Upvotes: 16

Kobi K
Kobi K

Reputation: 7931

It's not the best option but it's easy and simple, you can just parse the ParseError Extract the line and column and then use it to show where is the problem.

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
my_string = '<a><b><c></b></a>'
try:
    tree = ET.fromstring(my_string)
except ParseError as e:
    formatted_e = str(e)
    line = int(formatted_e[formatted_e.find("line ") + 5: formatted_e.find(",")])
    column = int(formatted_e[formatted_e.find("column ") + 7:])
    split_str = my_string.split("\n")
    print "{}\n{}^".format(split_str[line - 1], len(split_str[line - 1][0:column])*"-")

Note: the \n is just for the example you need to split it the right way.

Upvotes: 2

Related Questions