Reputation: 70
I though something like (1) would work, but it throws an error. Any ideas or tips?
(1)
versionPreCheck = lxml.html.parse("URL")
versionCheck = versionPreCheck.find(".//title").text
LatestVersion = (versionCheck.read())
Error:
Traceback (most recent call last):
File "python", line 132, in <module>
File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
File "src/lxml/parser.pxi", line 1839, in lxml.etree._parseDocument
File "src/lxml/parser.pxi", line 1865, in lxml.etree._parseDocumentFromURL
File "src/lxml/parser.pxi", line 1769, in lxml.etree._parseDocFromFile
File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFile
File "src/lxml/parser.pxi", line 600, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 710, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 637, in lxml.etree._raiseParseError
OSError: Error reading file 'bazorkversion--grify.repl.co': failed to load external entity "bazorkversion--grify.repl.co"
And here is the title:
https://bazorkversion--grify.repl.co/ the title is string "PreAlpha 3" (It appears in the top of your browser's tabs, next to the site's favicon)
Upvotes: 0
Views: 234
Reputation: 60014
You aren't the only one receiving this error, and it could be a fault in lxml
.
Instead, perhaps try and use another web-scraping module like BeautifulSoup
, as well as the requests
module to receive a request from the URL:
>>> import requests
>>> from bs4 import BeautifulSoup as BS
>>> r = requests.get('https://bazorkversion--grify.repl.co/')
>>> soup = BS(r.text, 'lxml')
>>> soup.title.text
'PreAlpha 3'
Upvotes: 1