Getting error while web scraping the link

Question

Getting an error while scraping the link given. Can anybody please help me out with the error, And code for scraping web for the link to get all the text data.

from urllib.request import Request, urlopen
link='https://novelfull.com/warriors-promise/chapter-1.html'
req = Request(link) 
webpage = urlopen(req).read()

Jacob Lee · Accepted Answer

You could try using requests:

>>> import requests
>>> res = requests.get("https://novelfull.com/warriors-promise/chapter-1.html")
>>> res.raise_for_status()
>>> res.text
'
...'

In order to get the content of the page (the actual story, in this case), you would likely need a web scraper, such as BeautifulSoup4 or lxml.

BeautifulSoup4

import bs4
import requests

res = requests.get("https://novelfull.com/warriors-promise/chapter-1.html")
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select("#chapter-content div:nth-child(3) div")[0]
content = elem.getText()

BeautifulSoup4 is a third-party module, so be sure to install it: pip install BeautifulSoup4.

lxml

from urllib.request import urlopen
from lxml import etree

res = urlopen("https://novelfull.com/warriors-promise/chapter-1.html")
htmlparser = etree.HTMLparser()
tree = etree.parse(res, htmlparser)
elem = tree.xpath("//div[@id='chapter-content']//div[3]//div")
content = elem.text

lxml is a third-party module, so be sure to install it: pip install lxml

Getting error while web scraping the link

Answers (2)

BeautifulSoup4

lxml

Related Questions