How to extract a desired content from a website?

Question

I am working on a web scraping project using python and beautifulsoup. I want to navigate 1000+ URL's and extract the published month of that particular issue.

So far I have tried the following code, but it is leading to an error. I'm a sort of new to web scraping.

from bs4 import BeautifulSoup
import requests

page = requests.get("https://academic.oup.com/cesifo/issue/64/3?browseBy=volume")
thread.sleep(5)
soup = BeautifulSoup(page.content, 'html.parser')

The error is:

requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(10054, 'WSAECONNRESET')"))

Kindly suggest a way through this.

SIM · Accepted Answer

Try using headers to get that specific content from that site. I'm not quite sure whether this output you want to grab actually. However, the fix here is to use headers.

from bs4 import BeautifulSoup
import requests

url = "https://academic.oup.com/cesifo/issue/64/3?browseBy=volume"

page = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(page.content, 'html.parser')
oDate = soup.select_one("h1 > .issue-info-pub").text
print(oDate)

Output:

Volume 64, Issue 3, September 2018

How to extract a desired content from a website?

Answers (1)

Related Questions