Reputation: 767
I am working with XML data for a HW assignment but I'm not sure how to tell if the XML is valid. This is my code so far. I think I'm supposed to parse the XML but was confused how to validate it.
//XML Data Sets:
http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html
//standard includes
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
%matplotlib inline
//import the lxml parser
from lxml import etree
dtd = etree.DTD('ebay.dtd')
ebay = etree.parse('ebay.xml')
Upvotes: 1
Views: 44
Reputation: 5039
lxml
will throw an exception lxml.etree.XMLSyntaxError
when it's asked to parse a badly formed XML.
So you could handle this like:
xml_fname = 'ebay.xml'
try:
ebay = etree.parse(xml_fname)
except etree.XMLSyntaxError:
print "Failed to parse an invalid XML: " + xml_fname
If you want to validate against a DTD:
xml = etree.XML(etree.tostring(ebay))
print(dtd.validate(xml))
Upvotes: 1