Reputation: 656
The text I'm trying to scrape is the title 123rd Meeting from
To do so, I use this code
import urllib.request #get the HTML page from url
import urllib.error
from bs4 import BeautifulSoup
# set page to read
with urllib.request.urlopen('https://www.bcb.gov.br/en/#!/c/copomstatements/1724') as response:
page = response.read()
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")
print(soup)
# Inspect: <h3 class="BCTituloPagina ng-binding">123rd Meeting</h3>
title = soup.find("h3", attrs={"class": "BCTituloPagina ng-binding"})
print(title)
However, the command
print(soup)
doesn't return neither the title: 123rd Meeting nor the body: In light of the .... target by 25 basis points.
Upvotes: 1
Views: 65
Reputation: 1357
You can't use the normal requests library in python to extract the title, as the element you're trying to extract is rendered with javascript. You will need to use selenium to achieve your goal.
Code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://www.bcb.gov.br/en/#!/c/copomstatements/1724')
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//h3')))
title = driver.find_element_by_xpath('//h3').text
print(title)
driver.close()
Output:
123rd Meeting
Upvotes: 1