BeautifulSoup not finding dates

Question

I'm trying to scrape some data from here: https://www.reuters.com/companies/AMPF.MI/financials/income-statement-quarterly.

I'd like to get the dates in the first row (ie. 31-Mar-21 31-Dec-20 30-Sep-20 30-Jun-20 31-Mar-20).

The problem comes when I try to get the date, with bs4 it outputs nothing. I wrote this code:

url = "https://www.reuters.com/companies/AMPF.MI/financials/income-statement-quarterly"
html_content = requests.get(url).text
soup = BeautifulSoup (html_content, "lxml")

a = soup.find('div', attrs = {"class": "tables-container"})
date = a.find("time").text;

When I execute it, it gives me nothing. Printing a, it can be seen that the find () doesn't get the date ... `

Thanks.

Andrej Kesely · Accepted Answer

The data is embedded within the page in JSON form. You can use this example how to parse it:

import json
import requests
from bs4 import BeautifulSoup

url = "https://www.reuters.com/companies/AMPF.MI/financials/income-statement-quarterly"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

x = data["props"]["initialState"]["markets"]["financials"]["financial_tables"]

headers = x["income_interim_tables"][0]["headers"]
print(*headers, sep="
")

Prints:

BeautifulSoup not finding dates

Answers (2)

Related Questions