Reputation: 712
I am trying to scrape some data off of the tables in https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/ Specifically, I want to scrape the 'Metropolitan tram' table. However, the html elements aren't structured well and I am unsure how to identify the table by name and scrape the content.
This is what I have tried:
import requests
from bs4 import BeautifulSoup
URL = "https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
tables = soup.find_all("div", class_="mceTmpl table__wrapper")
for table in tables:
print("NEXT-------------------------------------------")
print(table, end="\n"*2)
Upvotes: 0
Views: 54
Reputation: 25048
May use pandas.read_html()
in case of scraping tables, what is best practice and uses BeautifulSoup
under the hood and select your table from list by index.
Alternative use css selectors
:
soup.select('h3:has(a[name="metrotram"]) + div > div:first-of-type tr')
import pandas as pd
import requests
from bs4 import BeautifulSoup
pd.read_html(
requests.get(
'https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/',
headers={'user-agent':'some agent'}
).text,
header=0
)[1]
Unnamed: 0 | % timetable delivered | % services on-time at timing points | |
---|---|---|---|
0 | Sunday, 5 February 2023 | 99.4% | 83.3% |
1 | Saturday, 4 February 2023 | 99.4% | 81.8% |
2 | Friday, 3 February 2023 | 98.4% | 79.7% |
3 | Thursday, 2 February 2023 | 97.9% | 72.8% |
4 | Wednesday, 1 February 2023 | 98.9% | 79.1% |
5 | Tuesday, 31 January 2023 | 99.0% | 81.4% |
6 | Monday, 30 January 2023 | 99.3% | 90.2% |
Upvotes: 2