Reputation: 20460
I have some xml pieces like this:
<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
<player_birthday>1979-09-23</player_birthday>
<player_name>Orene Ai'i</player_name>
<player_team>Blues</player_team>
<player_id>453</player_id>
<player_height>170</player_height>
<player_position>F&W</player_position> <---- a '&' here.
<player_weight>75</player_weight>
</record>
Is there any way to validate whether the xml pieces is well-formatted? Is there any way to validate the xml against a DTD or XML Scheme?
For various reasons I can't use any third-party packages.
e.g. the xml above is not conrrect since it has a '&' in it. Note that the DOCTYPE definition sentence refer to a DTD.
Upvotes: 25
Views: 30652
Reputation: 110506
Just try to parse it with ElementTree (xml.etree.ElementTree.fromstring) - it will raise an error if the XML is not well formed.
>>> a = """<record>
... <player_birthday>1979-09-23</player_birthday>
... <player_name>Orene Ai'i</player_name>
... <player_team>Blues</player_team>
... <player_id>453</player_id>
... <player_height>170</player_height>
... <player_position>F&W</player_position> <---- a '&' here.
... <player_weight>75</player_weight>
... </record>"""
>>>
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
parser.feed(text)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24
Upvotes: 41
Reputation: 55253
You can use python's xml.dom.minidom
XML parser (which is in the standard library, but isn't as powerful as alternatives such as lxml
).
Just do:
import xml.dom.minidom
xml.dom.minidom.parseString('<My><XML><String/><XML/><My/>')
You will get a xml.parsers.expat.ExpatError
if the XML is invalid.
Upvotes: 9