Reputation: 263
I'm trying to write a simple Py web scraping file to extract specific values from a table on web page, but the results aren't coming in the current formal. I guess I'm doing something incorrect with the soup.find command.
URL = 'https://www.health.nsw.gov.au/news/Pages/20200329_01.aspx'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find('td', class_='moh-rteTableFooterOddCol-6')
print(results)
I'm expecting the value of 93,099, but the print provides the result
<td class="moh-rteTableFooterOddCol-6">93,099</td>
I'm not able to convert the format of results into string either.
Upvotes: 1
Views: 2179
Reputation: 661
changing print(results)
to print(results.string)
will display in the console:
93,099 is this what you wanted?
Upvotes: 1
Reputation: 1395
You can access it using contents
property.
import requests
from bs4 import BeautifulSoup
URL = 'https://www.health.nsw.gov.au/news/Pages/20200329_01.aspx'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find('td', class_='moh-rteTableFooterOddCol-6')
if results and results.contents :
print(int(results.contents[0].replace(',','')))
In the future, whenever you do not know the properties of the object returned use __dict__
to decode the object.
As an example,
>> results.__dict__
{'attrs': {'class': ['moh-rteTableFooterOddCol-6']},
'can_be_empty_element': False,
'cdata_list_attributes': {'*': ['class', 'accesskey', 'dropzone'],
'a': ['rel', 'rev'],
'area': ['rel'],
'form': ['accept-charset'],
'icon': ['sizes'],
'iframe': ['sandbox'],
'link': ['rel', 'rev'],
'object': ['archive'],
'output': ['for'],
'td': ['headers'],
'th': ['headers']},
'contents': ['93,099'],
'hidden': False,
'known_xml': False,
'name': 'td',
'namespace': None,
'next_element': '93,099',
'next_sibling': None,
'parent': <tr class="moh-rteTableFooterRow-6"><td class="moh-rteTableFooterEvenCol-6">Total</td>
<td class="moh-rteTableFooterOddCol-6">93,099</td></tr>,
'parser_class': bs4.BeautifulSoup,
'prefix': None,
'preserve_whitespace_tags': {'pre', 'textarea'},
'previous_element': '\n',
'previous_sibling': '\n',
'sourceline': 1075,
'sourcepos': 0}
Upvotes: 2