Grify Dev
Grify Dev

Reputation: 70

I need to create a new string, with the contents being the title of a website

I though something like (1) would work, but it throws an error. Any ideas or tips?

(1)

versionPreCheck = lxml.html.parse("URL")
versionCheck = versionPreCheck.find(".//title").text

LatestVersion = (versionCheck.read())

Error:

Traceback (most recent call last):
  File "python", line 132, in <module>
  File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1839, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1865, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1769, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 600, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 710, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 637, in lxml.etree._raiseParseError
OSError: Error reading file 'bazorkversion--grify.repl.co': failed to load external entity "bazorkversion--grify.repl.co"

And here is the title:

https://bazorkversion--grify.repl.co/ the title is string "PreAlpha 3" (It appears in the top of your browser's tabs, next to the site's favicon)

Upvotes: 0

Views: 234

Answers (1)

TerryA
TerryA

Reputation: 60014

You aren't the only one receiving this error, and it could be a fault in lxml.

Instead, perhaps try and use another web-scraping module like BeautifulSoup, as well as the requests module to receive a request from the URL:

>>> import requests
>>> from bs4 import BeautifulSoup as BS
>>> r = requests.get('https://bazorkversion--grify.repl.co/')
>>> soup = BS(r.text, 'lxml')
>>> soup.title.text
'PreAlpha 3'

Upvotes: 1

Related Questions