user3889486
user3889486

Reputation: 656

Cannot scrape a website with BeautifulSoup4

The text I'm trying to scrape is the title 123rd Meeting from

https://www.bcb.gov.br/en/#!/c/copomstatements/1724

To do so, I use this code

import urllib.request           #get the HTML page from url 
import urllib.error

from bs4 import BeautifulSoup


# set page to read
with urllib.request.urlopen('https://www.bcb.gov.br/en/#!/c/copomstatements/1724') as response:
   page = response.read()

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")
print(soup)

# Inspect: <h3 class="BCTituloPagina ng-binding">123rd Meeting</h3>
title = soup.find("h3", attrs={"class": "BCTituloPagina ng-binding"})
print(title)

However, the command

print(soup)

doesn't return neither the title: 123rd Meeting nor the body: In light of the .... target by 25 basis points.

Upvotes: 1

Views: 65

Answers (1)

Ali
Ali

Reputation: 1357

You can't use the normal requests library in python to extract the title, as the element you're trying to extract is rendered with javascript. You will need to use selenium to achieve your goal.

Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.bcb.gov.br/en/#!/c/copomstatements/1724')
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//h3')))
title = driver.find_element_by_xpath('//h3').text
print(title)
driver.close()

Output:

123rd Meeting

Upvotes: 1

Related Questions