Reputation: 31
I am trying to web scrape an upcoming event date on reuters.com using Python and Beautifulsoup package.
Unfortunately it seems harder than expected to get out the upcoming earnings event date and time from HTML.
I do not understand why I cannot get a visible output via the below script although I can see the value while web inspecting the target URL. Does anybody know why? Is there any viable work-around?
header = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:70.0) Gecko/20100101 Firefox/70.0', }
URL = f'https://www.reuters.com/companies/SAPG.DE/events'
page = requests.get(URL, headers=header)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='__next')
job_elems = results.find_all('section', class_='Events-section-2YwsJ')
for job_elem in job_elems:
event_type = job_elem.find('h3').text
if event_type.find('Events') != -1:
print(job_elem.find('h3').text)
items = job_elem.find_all('div', class_='EventList-event-Veu-f')
for item in items:
title = item.find('span').text
earnings_time = item.find('time').get_text()
if title.find('Earnings Release') != -1:
print(earnings_time)
The attributes class of the "object" in question is EventList-date-cLNT9 which I have never seen before.
Upvotes: 2
Views: 366
Reputation: 20042
The reason for that is those events are added dynamically by JavaScript
, which means that they are not visible in the HTML
you get back.
However, there's an API you can query to get the events
Here's how:
import requests
api_url = "https://www.reuters.com/companies/api/getFetchCompanyEvents/SAPG.DE"
response = requests.get(api_url).json()
for event in response["market_data"]["upcoming_event"]:
print(f"{event['name']} - {event['time']}")
Output:
SAP SE at Morgan Stanley Technology, Media and Telecom Conference (Virtual) - 2021-03-01T16:45:00Z
Q1 2021 SAP SE Earnings Release - 2021-04-22T06:30:00Z
Upvotes: 1
Reputation: 165
This happens as time
tag is using js to load, but bs4 uses html,, you have 2 options :
one is to use selenium ,or to use their API.
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
URL = f'https://www.reuters.com/companies/SAPG.DE/events'
page = driver.get(URL)
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find(id='__next')
job_elems = results.find_all('section', class_='Events-section-2YwsJ')
for job_elem in job_elems:
event_type = job_elem.find('h3').text
if event_type.find('Events') != -1:
print(job_elem.find('h3').text)
items = job_elem.find_all('div', class_='EventList-event-Veu-f')
for item in items:
title = item.find('span').text
time = item.find('time').text
print(f"Title: {title}, Time: {time}")
driver.quit()
output :
Upcoming Events
Title: SAP SE at Morgan Stanley Technology, Media and Telecom Conference (Virtual), Time: 1 Mar 2021 / 6PM EET
Title: Q1 2021 SAP SE Earnings Release, Time: 22 Apr 2021 / 8AM EET
Upvotes: 1