Cannot scrape a website with BeautifulSoup4

Question

The text I'm trying to scrape is the title 123rd Meeting from

https://www.bcb.gov.br/en/#!/c/copomstatements/1724

To do so, I use this code

import urllib.request           #get the HTML page from url 
import urllib.error

from bs4 import BeautifulSoup


# set page to read
with urllib.request.urlopen('https://www.bcb.gov.br/en/#!/c/copomstatements/1724') as response:
   page = response.read()

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")
print(soup)

# Inspect: 123rd Meeting
title = soup.find("h3", attrs={"class": "BCTituloPagina ng-binding"})
print(title)

However, the command

print(soup)

doesn't return neither the title: 123rd Meeting nor the body: In light of the .... target by 25 basis points.

Ali · Accepted Answer

You can't use the normal requests library in python to extract the title, as the element you're trying to extract is rendered with javascript. You will need to use selenium to achieve your goal.

Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.bcb.gov.br/en/#!/c/copomstatements/1724')
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//h3')))
title = driver.find_element_by_xpath('//h3').text
print(title)
driver.close()

Output:

123rd Meeting

Cannot scrape a website with BeautifulSoup4

Answers (1)

Related Questions