Reputation: 6197
I am getting an error when I parse my xml. It gives a line and column number, but I am not sure how to go about locating it.
My code
urlBase = 'https://www.goodreads.com/review/list_rss/'
urlMiddle = '?shelf=read&order=d&sort=rating&per_page=200&page='
finalUrl = urlBase + str(32994) + urlMiddle +str(1)
resp = requests.get(finalUrl)
from xml.etree import ElementTree as ET
x = ET.fromstring(resp.content)
Error
File "<string>", line unknown
ParseError: not well-formed (invalid token): line 952, column 1023
I try to print the contents, but it's just one line
resp.content
The output is too big to print here.
So I'm not sure how to check a specific line since it's just one line.
Upvotes: 1
Views: 527
Reputation: 22992
You are trying to parse a HTML content with an XML parser. You may run into problem if the content is not XML-valid: if it is not XHTML.
Instead of that, you can use a HTML parser like the one available with lxml.
For instance
parser = etree.HTMLParser()
tree = etree.parse(BytesIO(resp.content), parser)
This will solve your issue.
Upvotes: 1
Reputation: 6826
Most likely you are on Windows and the print isn’t respecting e.g \n.
Try adding:
open(‘resp.xml’).write(resp.content)
After where you get resp
Then, you can open resp.xml in an editor and see what line 952 looks like.
Upvotes: 1