Caerus
Caerus

Reputation: 674

Exclude a Span Class from a table in BeautifulSoup

The following code extracts data from a specific table on a webpage:

import requests
from bs4 import BeautifulSoup
url="XYZ"
sector_response=requests.get(url)
soup=BeautifulSoup(sector_response.content,'lxml')

#Find the desired table
table=soup.find('table',attrs={'class': 'snapshot-data-tbl'})
headings = [th.get_text() for th in table.find("tr").find_all("th")]
for row in table.find_all("tr"):
    dataset = list(zip(headings, (td.get_text() for td in row.find_all("td"))))  
#Exclude the 'Weighting Recommendations' tuple
new_dataset=[i for i in dataset if i[0]!='Weighting Recommendations']
for item in new_dataset:
    print(item)

However, each of the cells in the body of the table contain a timestamp span class that I don't need. How can I exclude these?

For example:

<td>
<span class="negative">-0.39%</span>
<span class="timestamp"><time>04:20 PM ET 09/28/2018</time></span>
</td>

Current output:

('Last % Change', '\n-0.39%\n04:20 PM ET 09/28/2018\n')

Desired output:

('Last % Change', -0.39)

Upvotes: 1

Views: 432

Answers (1)

ARR
ARR

Reputation: 2308

If the span class name for the target span is always “negative” you could do the following:

for row in table.find_all("tr"):
    dataset = list(zip(headings, (td.find(‘span’, { “class”: “negative”} ).get_text() for td in row.find_all(“td”))))

Or if it’s not always “negative” you could find

for row in table.find_all("tr"):
    dataset = list(zip(headings, (td.find(‘span’).get_text() for td in row.find_all(“td”))))

Also to let your program run smoothly try to catch all possible errors. For example what if the td couldn’t be found?

Now it will just crash.

Upvotes: 1

Related Questions