How to remove specific class in a row from table data using BeautifulSoup

Question

I am trying to scrape data row by row from a table.
However, in some rows, two different classes ["show-for-medium-up", "hide-for-medium-up"] and the data are being linked resulting in a repetitive number. For example, the first number is 10.837 and the second number is 10.84. The resulting number for the cell will be 10.83710.84. I would like to remove the last number.
How to remove only the last class "hide-for-medium-up"?
See my code:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re

url = "https://uk.flightaware.com/live/flight/AZA202/history/20210224/0856Z/LIRF/EGLL/tracklog"
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "lxml")

# Number of tables:
flt_tables = soup.find_all("table", attrs={"class": "prettyTable fullWidth"})
print("N. tables: ", len(flt_tables))

# Scraping first table - headers only
table1 = flt_tables[0]
# Data row by row ('tr' -> row)
table_rows = table1.find_all("tr")

header = table_rows[0] # header
table_data = table_rows[1:] # table data (excluding header)

headers = []
for item in header.find_all("th"): # loop in 'th' elements
    item = (item.text).rstrip("
") # getting text part and removing '
'
    headers.append(item)
print(headers)

# Scraping table data ('td')
all_rows = []
for row_num in range(len(table_data)): # A row at a time
    row = []
    for row_item in table_data[row_num].find_all("td"): #loop in 'td' elements
        # regex -> removing \xa0 and 
 and comma from row_item.text
        # xa0 encodes the flag, 
 is the newline and comma separates thousands in numbers
        aa = re.sub("(\xa0)|(
)|,","", row_item.text)
        row.append(aa)
    all_rows.append(row)

print(all_rows[5])

The output for print(headers) and print(all_rows[5]) are respectively:

['Time (JST)JST', 'LatitudeLat', 'LongitudeLon', 'CourseDir', 'kts', 'mph', 'meters', 'Rate', 'Reporting Facility']

['Wed 17:50:3405:50PM', '42.247342.25', '10.891910.89', '← 297°', '449', '517', '7,9717971', ' ', ' FlightAware ADS-B (LIRG) ']

The text in bold is the part to remove.

The 'tr' html code for the first data row:


Wed 17:50:3405:50PM    Departure (FCO) @ Wednesday 09:50:34 CET 


 FlightAware ADS-B  (LIRG)

How to remove specific class in a row from table data using BeautifulSoup

Answers (1)

Related Questions