Matteo
Matteo

Reputation: 53

BeautifulSoup not finding dates

I'm trying to scrape some data from here: https://www.reuters.com/companies/AMPF.MI/financials/income-statement-quarterly.

I'd like to get the dates in the first row (ie. 31-Mar-21 31-Dec-20 30-Sep-20 30-Jun-20 31-Mar-20).

The problem comes when I try to get the date, with bs4 it outputs nothing. I wrote this code:

url = "https://www.reuters.com/companies/AMPF.MI/financials/income-statement-quarterly"
html_content = requests.get(url).text
soup = BeautifulSoup (html_content, "lxml")

a = soup.find('div', attrs = {"class": "tables-container"})
date = a.find("time").text;

When I execute it, it gives me nothing. Printing a, it can be seen that the find () doesn't get the date ... `

<th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg"></time>

Thanks.

Upvotes: 1

Views: 141

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195418

The data is embedded within the page in JSON form. You can use this example how to parse it:

import json
import requests
from bs4 import BeautifulSoup

url = "https://www.reuters.com/companies/AMPF.MI/financials/income-statement-quarterly"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

x = data["props"]["initialState"]["markets"]["financials"]["financial_tables"]

headers = x["income_interim_tables"][0]["headers"]
print(*headers, sep="\n")

Prints:

2021-03-31
2020-12-31
2020-09-30
2020-06-30
2020-03-31

Upvotes: 3

Pythocrates
Pythocrates

Reputation: 553

As I do not have enough reputation to comment:

The problem is that the scraped HTML does not contain the dates. The time tags are empty.

You need a way to scrape while pre-rendering the JavaScript which fills in the dates. This is a different topic which requires some headless browser or other approaches, e.g. https://www.scrapingbee.com/blog/scrapy-javascript/

Upvotes: 0

Related Questions