Scraping table of data from webpage with inconsistently nested html tags

Question

I am trying to scrape some data off of the tables in https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/ Specifically, I want to scrape the 'Metropolitan tram' table. However, the html elements aren't structured well and I am unsure how to identify the table by name and scrape the content.

This is what I have tried:

import requests
from bs4 import BeautifulSoup

URL = "https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")


tables = soup.find_all("div", class_="mceTmpl table__wrapper")
for table in tables:
    print("NEXT-------------------------------------------")
    print(table, end="
"*2)

HedgeHog · Accepted Answer

May use pandas.read_html() in case of scraping tables, what is best practice and uses BeautifulSoup under the hood and select your table from list by index.

Alternative use css selectors :

soup.select('h3:has(a[name="metrotram"]) + div > div:first-of-type tr')

Example

import pandas as pd
import requests
from bs4 import BeautifulSoup
pd.read_html(
    requests.get(
        'https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/', 
        headers={'user-agent':'some agent'}
    ).text,
    header=0
)[1]

Output

	Unnamed: 0	% timetable delivered	% services on-time at timing points
0	Sunday, 5 February 2023	99.4%	83.3%
1	Saturday, 4 February 2023	99.4%	81.8%
2	Friday, 3 February 2023	98.4%	79.7%
3	Thursday, 2 February 2023	97.9%	72.8%
4	Wednesday, 1 February 2023	98.9%	79.1%
5	Tuesday, 31 January 2023	99.0%	81.4%
6	Monday, 30 January 2023	99.3%	90.2%

Scraping table of data from webpage with inconsistently nested html tags

Answers (1)

Example

Output

Related Questions