SyncMaster
SyncMaster

Reputation: 9946

Python gives 'Not well-formed xml' error because of presence of '&' characters

I am reading an xml file using Python. But my xml file contains & characters, because of which while running my Python code, it gives the following error:

xml.parsers.expat.ExpatError: not well-formed (invalid token):

Is there a way to ignore the & check by python?

Upvotes: 8

Views: 15041

Answers (2)

Kumar
Kumar

Reputation: 237

For me adding the line "<?xml version='1.0' encoding='iso-8859-1'?>" in front the string is did the trick.

>>> text = '''<?xml version="1.0" encoding="iso-8859-1"?>
    ... <seuss><fish>red</fish><fish>blu\xe9</fish></seuss>'''
>>> doc = elementtree.ElementTree.fromstring(text)

Refer this page https://mail.python.org/pipermail/tutor/2006-November/050757.html

Upvotes: 2

Michael Kay
Michael Kay

Reputation: 163458

No, you can't ignore the check. Your 'xml file' is not an XML file - to be an XML file, the ampersand would have to be escaped. Therefore, no software that is designed to read XML files will parse it without error. You need to correct the software that generated this file so that it generates proper ("well-formed") XML. All the benefits of using XML for interchange disappear entirely if people start sending stuff that isn't well-formed and people receiving it try to patch it up.

Upvotes: 8

Related Questions