Reputation: 773
I am writing a program that will receive a string via stdin. The String will always have the root node <Exam>
, and two child nodes: <Question>
and <Answer>
. What's a function that will validate that the XML is properly formatted (is not missing any tags or angled brackets)?
I've tried using etree but am running into errors:
def isProperlyFormattedXML():
parser = etree.XMLParser(dtd_validation=True)
schema_root = etree.XML('''\
<Exam>
<Question type="Short Response">
What does OOP stand for?
</Question>
<Answer type="Short Response">
"Object Oriented programming"
</Answer>
</Exam>
''')
schema = etree.XMLSchema(schema_root)
#Good xml
parser = etree.XMLParser(schema = schema)
try:
root = etree.fromstring("<a>5</a>", parser)
print ("Finished validating good xml")
return True
except lxml.etree.XMLSyntaxError as err:
print (err)
#Bad xml
parser = etree.XMLParser(schema = schema)
try:
root = etree.fromstring("<a>5<b>foobar</b></a>", parser)
except lxml.etree.XMLSyntaxError as err:
print (err)
return False
Error:
lxml.etree.XMLSchemaParseError: The XML document 'in_memory_buffer' is not a schema document.```
Upvotes: 3
Views: 6220
Reputation: 142
You have the solution already. You have to use try/except to check that.
Upvotes: 2