Reputation: 7789
I've tried using the following code which has invalid dtd/xml
<city>
<address>
<zipcode>4455</zipcode>
</address>
I'm trying to parse with with lxml
like this,
from lxml import etree as ET
parser = ET.XMLParser(dtd_validation=False)
tree = ET.fromstring(xml_data,parser)
print(tree.xpath('//zipcode'))
Unfortunately, This code still gives xml errors,
Any idea how i can get a non-validating parse of the above xml?
Upvotes: 0
Views: 1355
Reputation: 89295
Assuming that by 'invalid dtd' you meant that the <city>
tag is not closed in above XML sample, then your document is actually invalid XML or frankly it isn't XML at all because it doesn't follow XML rules.
You need to fix the document somehow to be able to treat it as an XML document. For this simple unclosed tag case, setting recover=True
will do the job :
from lxml import etree as ET
parser = ET.XMLParser(recover=True)
tree = ET.fromstring(xml_data,parser)
print(tree.xpath('//zipcode'))
Upvotes: 2