Reputation: 1460
I am reading a file with a jml extension. The code is very simple and it reads
import xml.etree.ElementTree as ET
tree = ET.parse('VOAPoints_2010_M25.jml')
root = tree.getroot()
but I get a parsing error:
ParseError: not well-formed (invalid token): line 75, column 16
the file I am trying to read is a dataset that has been used before so I am confident that there are no problems with it.
Upvotes: 4
Views: 353
Reputation: 2946
Sorry for using an answer as a question, but formatting this inside a comment is painful. Does the code below solve your problem?
import xml.etree.ElementTree as ET
myParser = ET.XMLParser(encoding="utf-8")
tree = ET.parse('VOAPoints_2010_M25.jml',parser=myParser)
root = tree.getroot()
Upvotes: 2
Reputation: 107567
Since the pound sign was the issue, you can escape it with the character entity £
. Python can even automate the replace in XML file by iteratively reading each line and replacing it conditionally on the pound symbol:
import xml.etree.ElementTree as ET
oldfile = "VOAPoints_2010_M25.jml"
newfile = "VOAPoints_2010_M25_new.jml"
with open(oldfile, 'r') as otxt:
for rline in otxt:
if "£" in rline:
rline = rline.replace("£", "£")
with open(newfile, 'a') as ntxt:
ntxt.write(rline)
tree = ET.parse(newfile)
root = tree.getroot()
Upvotes: 1