PKlumpp
PKlumpp

Reputation: 5233

Identify corrupt .xml in python

I have a small python script that is reading a couple of .XML files. Now i have to assert that those .XML files are not corrupted in any way. How can I check this? What I do to read them is:

xml_tree = ET.parse(path) //path = path to .xml
xml_file = xml_tree.getroot()

Upvotes: 0

Views: 1737

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123500

ET.parse() raises a ParseError exception if the XML file is corrupt:

>>> print open('test.xml').read()
This is not an XML file

>>> from xml.etree import ElementTree as ET
>>> ET.parse('test.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1182, in parse
    tree.parse(source, parser)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
    parser.feed(data)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 0

Simply catch that exception:

try:
    ET.parse(path)
except ET.ParseError:
    print('{} is corrupt'.format(path))

Upvotes: 2

Related Questions