Exclude a Span Class from a table in BeautifulSoup

Question

The following code extracts data from a specific table on a webpage:

import requests
from bs4 import BeautifulSoup
url="XYZ"
sector_response=requests.get(url)
soup=BeautifulSoup(sector_response.content,'lxml')

#Find the desired table
table=soup.find('table',attrs={'class': 'snapshot-data-tbl'})
headings = [th.get_text() for th in table.find("tr").find_all("th")]
for row in table.find_all("tr"):
    dataset = list(zip(headings, (td.get_text() for td in row.find_all("td"))))  
#Exclude the 'Weighting Recommendations' tuple
new_dataset=[i for i in dataset if i[0]!='Weighting Recommendations']
for item in new_dataset:
    print(item)

However, each of the cells in the body of the table contain a timestamp span class that I don't need. How can I exclude these?

For example:


-0.39%
04:20 PM ET 09/28/2018

Current output:

('Last % Change', '
-0.39%
04:20 PM ET 09/28/2018
')

Desired output:

('Last % Change', -0.39)

ARR · Accepted Answer

If the span class name for the target span is always “negative” you could do the following:

for row in table.find_all("tr"):
    dataset = list(zip(headings, (td.find(‘span’, { “class”: “negative”} ).get_text() for td in row.find_all(“td”))))

Or if it’s not always “negative” you could find

for row in table.find_all("tr"):
    dataset = list(zip(headings, (td.find(‘span’).get_text() for td in row.find_all(“td”))))

Also to let your program run smoothly try to catch all possible errors. For example what if the td couldn’t be found?

Now it will just crash.

Exclude a Span Class from a table in BeautifulSoup

Answers (1)

Related Questions