m.eder
m.eder

Reputation: 31

unable to scrape date/time info using Beautifulsoup

I am trying to web scrape an upcoming event date on reuters.com using Python and Beautifulsoup package.

Unfortunately it seems harder than expected to get out the upcoming earnings event date and time from HTML.

I do not understand why I cannot get a visible output via the below script although I can see the value while web inspecting the target URL. Does anybody know why? Is there any viable work-around?

header = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:70.0) Gecko/20100101 Firefox/70.0', }
URL = f'https://www.reuters.com/companies/SAPG.DE/events'
page = requests.get(URL, headers=header)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='__next')
job_elems = results.find_all('section', class_='Events-section-2YwsJ')

for job_elem in job_elems:
    event_type = job_elem.find('h3').text
    if event_type.find('Events') != -1:
        print(job_elem.find('h3').text)
        items = job_elem.find_all('div', class_='EventList-event-Veu-f')
        for item in items:
            title = item.find('span').text
            earnings_time = item.find('time').get_text()
            if title.find('Earnings Release') != -1:
                print(earnings_time)

The attributes class of the "object" in question is EventList-date-cLNT9 which I have never seen before.

Upvotes: 2

Views: 366

Answers (2)

baduker
baduker

Reputation: 20042

The reason for that is those events are added dynamically by JavaScript, which means that they are not visible in the HTML you get back.

However, there's an API you can query to get the events

Here's how:

import requests

api_url = "https://www.reuters.com/companies/api/getFetchCompanyEvents/SAPG.DE"
response = requests.get(api_url).json()

for event in response["market_data"]["upcoming_event"]:
    print(f"{event['name']} - {event['time']}")

Output:

SAP SE at Morgan Stanley Technology, Media and Telecom Conference (Virtual) - 2021-03-01T16:45:00Z
Q1 2021 SAP SE Earnings Release - 2021-04-22T06:30:00Z

Upvotes: 1

AhmedO
AhmedO

Reputation: 165

This happens as time tag is using js to load, but bs4 uses html,, you have 2 options : one is to use selenium ,or to use their API.

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
URL = f'https://www.reuters.com/companies/SAPG.DE/events'

page = driver.get(URL)

soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find(id='__next')
job_elems = results.find_all('section', class_='Events-section-2YwsJ')

for job_elem in job_elems:
event_type = job_elem.find('h3').text
if event_type.find('Events') != -1:
    print(job_elem.find('h3').text)
    items = job_elem.find_all('div', class_='EventList-event-Veu-f')
    for item in items:
        title = item.find('span').text
        time = item.find('time').text
        print(f"Title: {title}, Time: {time}")

driver.quit()

output :

Upcoming Events
Title: SAP SE at Morgan Stanley Technology, Media and Telecom Conference (Virtual), Time: 1 Mar 2021 / 6PM EET
Title: Q1 2021 SAP SE Earnings Release, Time: 22 Apr 2021 / 8AM EET

Upvotes: 1

Related Questions