SantoshGupta7
SantoshGupta7

Reputation: 6197

How to locate an XML error in python given the line number and column number?

I am getting an error when I parse my xml. It gives a line and column number, but I am not sure how to go about locating it.

My code

urlBase = 'https://www.goodreads.com/review/list_rss/'
urlMiddle = '?shelf=read&order=d&sort=rating&per_page=200&page='
finalUrl = urlBase + str(32994) + urlMiddle +str(1)
resp = requests.get(finalUrl)
from xml.etree import ElementTree as ET
x = ET.fromstring(resp.content)

Error

  File "<string>", line unknown
ParseError: not well-formed (invalid token): line 952, column 1023

I try to print the contents, but it's just one line

resp.content

The output is too big to print here.

So I'm not sure how to check a specific line since it's just one line.

Upvotes: 1

Views: 527

Answers (2)

Laurent LAPORTE
Laurent LAPORTE

Reputation: 22992

You are trying to parse a HTML content with an XML parser. You may run into problem if the content is not XML-valid: if it is not XHTML.

Instead of that, you can use a HTML parser like the one available with lxml.

For instance

parser = etree.HTMLParser()
tree   = etree.parse(BytesIO(resp.content), parser)

This will solve your issue.

Upvotes: 1

Most likely you are on Windows and the print isn’t respecting e.g \n.

Try adding: open(‘resp.xml’).write(resp.content) After where you get resp

Then, you can open resp.xml in an editor and see what line 952 looks like.

Upvotes: 1

Related Questions