Reputation: 5233
I have a small python script that is reading a couple of .XML files. Now i have to assert that those .XML files are not corrupted in any way. How can I check this? What I do to read them is:
xml_tree = ET.parse(path) //path = path to .xml
xml_file = xml_tree.getroot()
Upvotes: 0
Views: 1737
Reputation: 1123500
ET.parse()
raises a ParseError
exception if the XML file is corrupt:
>>> print open('test.xml').read()
This is not an XML file
>>> from xml.etree import ElementTree as ET
>>> ET.parse('test.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 0
Simply catch that exception:
try:
ET.parse(path)
except ET.ParseError:
print('{} is corrupt'.format(path))
Upvotes: 2