Reputation: 3
I was trying to find a way to get the temperature data from a website. No matter whatever I do, I am getting None as the output.
Here is the code I used
import urllib.request
from bs4 import BeautifulSoup
contenturl = "http://www.awebsite.com/"
soup = BeautifulSoup(urllib.request.urlopen(contenturl).read())
table = soup.find('table')
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
for td in cols:
text = ''.join(td.find(text=True))
print (text+"|"),
print ()
I have been using BS with Python 3. Any help is appreciated.
Upvotes: 0
Views: 110
Reputation: 1122322
To get the temperature, find the table row with the text 'Temperature' in it:
import re
temperature_row = soup.find(text=re.compile('Temperature')).find_parent('tr')
temperature = temperature_row.find_all('td')[-1].get_text()
Demo:
>>> temperature_row = soup.find(text=re.compile('Temperature')).find_parent('tr')
>>> temperature_row.find_all('td')[-1].get_text()
'85.9°F\n'
To get all the temperature data, I'd start looking for the header with the 'Current Weather' text; it is wrapped in a <big>
tag (ick, deprecated HTML tags), then process all rows with two cells following:
row = soup.find('big', text=re.compile('Current\s*Weather')).find_parent('tr')
while True:
row = row.find_next_sibling('tr')
cells = row.find_all('td')
if len(cells) != 2:
break
label, value = (cell.get_text().strip() for cell in cells)
print(label, value, sep=': ')
This produces:
>>> row = soup.find('big', text=re.compile('Current\s*Weather')).find_parent('tr')
>>> while True:
... row = row.find_next_sibling('tr')
... cells = row.find_all('td')
... if len(cells) != 2:
... break
... label, value = (cell.get_text().strip() for cell in cells)
... print(label, value, sep=': ')
...
Temperature: 85.9°F
Humidity: 50%
Dewpoint: 65.1°F
Wind: ESE at 7.0 mph
Barometer: 27.346 in & Falling Slowly
Today's Rain: 0.00 in
Yearly Rain: 11.49 in
Wind Chill: 85.4°F
THW Index: 87.6°F
Heat Index: 88.1°F
Upvotes: 2