WoooHaaaa
WoooHaaaa

Reputation: 20460

How to validate xml using python without third-party libs?

I have some xml pieces like this:

<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
    <player_birthday>1979-09-23</player_birthday>
    <player_name>Orene Ai'i</player_name>
    <player_team>Blues</player_team>
    <player_id>453</player_id>
    <player_height>170</player_height>
    <player_position>F&W</player_position>   <---- a '&' here.
    <player_weight>75</player_weight>
</record>

Is there any way to validate whether the xml pieces is well-formatted? Is there any way to validate the xml against a DTD or XML Scheme?

For various reasons I can't use any third-party packages.

e.g. the xml above is not conrrect since it has a '&' in it. Note that the DOCTYPE definition sentence refer to a DTD.

Upvotes: 25

Views: 30652

Answers (2)

jsbueno
jsbueno

Reputation: 110506

Just try to parse it with ElementTree (xml.etree.ElementTree.fromstring) - it will raise an error if the XML is not well formed.

>>> a = """<record>
...     <player_birthday>1979-09-23</player_birthday>
...     <player_name>Orene Ai'i</player_name>
...     <player_team>Blues</player_team>
...     <player_id>453</player_id>
...     <player_height>170</player_height>
...     <player_position>F&W</player_position>   <---- a '&' here.
...     <player_weight>75</player_weight>
... </record>"""
>>> 
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
    parser.feed(text)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
    self._raiseerror(v)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24

Upvotes: 41

Thomas Orozco
Thomas Orozco

Reputation: 55253

You can use python's xml.dom.minidom XML parser (which is in the standard library, but isn't as powerful as alternatives such as lxml).

Just do:

import xml.dom.minidom
xml.dom.minidom.parseString('<My><XML><String/><XML/><My/>')

You will get a xml.parsers.expat.ExpatError if the XML is invalid.

Upvotes: 9

Related Questions