jhaywoo8
jhaywoo8

Reputation: 767

How to tell if valid data

I am working with XML data for a HW assignment but I'm not sure how to tell if the XML is valid. This is my code so far. I think I'm supposed to parse the XML but was confused how to validate it.

//XML Data Sets: 
http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html


//standard includes
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

%matplotlib inline


//import the lxml parser
from lxml import etree

dtd = etree.DTD('ebay.dtd')
ebay = etree.parse('ebay.xml') 

Upvotes: 1

Views: 44

Answers (1)

DBedrenko
DBedrenko

Reputation: 5039

lxml will throw an exception lxml.etree.XMLSyntaxError when it's asked to parse a badly formed XML.

So you could handle this like:

xml_fname = 'ebay.xml'
try:
    ebay = etree.parse(xml_fname) 
except etree.XMLSyntaxError:
    print "Failed to parse an invalid XML: " + xml_fname

If you want to validate against a DTD:

xml = etree.XML(etree.tostring(ebay))
print(dtd.validate(xml))

Upvotes: 1

Related Questions