Reputation: 43
I am very new to python and have trouble with the code below. I am trying to get either the temperature or the date on the website, but can't seem to get an output. I have tried many variations, but still can't seem to get it right..
Thank you for your help!
#Code below:
import requests,bs4
r = requests.get('http://www.hko.gov.hk/contente.htm')
print r.raise_for_status()
hkweather = bs4.BeautifulSoup(r.text)
print hkweather.select('div left_content fnd_day fnd_date')
Upvotes: 4
Views: 100
Reputation: 180401
Your css selector is incorrect, you should use .
between the tag and css classes, the tags you want are in the divs with the fnd_day
class inside the div with the id fnd_content
divs = soup.select("#fnd_content div.fnd_day")
But that still won't get the data as it is dynamically generated through an ajax request, you can get all the data in json format using the code below:
u = "http://www.hko.gov.hk/wxinfo/json/one_json.xml?_=1468955579991"
data = requests.get(u).json()
from pprint import pprint as pp
pp(data)
That returns pretty much all the dynamic content including the dates and temps etc..
If you access the key F9D, you can see the general weather description all the temps and dates:
from pprint import pprint as pp
pp(data['F9D'])
Output:
{'BulletinDate': '20160720',
'BulletinTime': '0315',
'GeneralSituation': 'A southwesterly airstream will bring showers to the '
'coast of Guangdong today. Under the dominance of an '
'upper-air anticyclone, it will be generally fine and '
'very hot over southern China in the latter part of this '
'week and early next week.',
'NPTemp': '25',
'WeatherForecast': [{'ForecastDate': '20160720',
'ForecastIcon': 'pic53.png',
'ForecastMaxrh': '95',
'ForecastMaxtemp': '32',
'ForecastMinrh': '70',
'ForecastMintemp': '26',
'ForecastWeather': 'Sunny periods and a few showers. '
'Isolated squally thunderstorms at '
'first.',
'ForecastWind': 'South to southwest force 4.',
'IconDesc': 'Sunny Periods with A Few Showers',
'WeekDay': '3'},
{'ForecastDate': '20160721',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '28',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'South to southwest force 3 to 4.',
'IconDesc': 'Hot',
'WeekDay': '4'},
{'ForecastDate': '20160722',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '28',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'Southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '5'},
{'ForecastDate': '20160723',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '34',
'ForecastMinrh': '65',
'ForecastMintemp': '29',
'ForecastWeather': 'Fine and very hot.',
'ForecastWind': 'Southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '6'},
{'ForecastDate': '20160724',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '34',
'ForecastMinrh': '65',
'ForecastMintemp': '29',
'ForecastWeather': 'Fine and very hot.',
'ForecastWind': 'Southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '0'},
{'ForecastDate': '20160725',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '29',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'South to southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '1'},
{'ForecastDate': '20160726',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '29',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'South to southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '2'},
{'ForecastDate': '20160727',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '28',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'Southwest force 3 to 4.',
'IconDesc': 'Hot',
'WeekDay': '3'},
{'ForecastDate': '20160728',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '28',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'Southwest force 3 to 4.',
'IconDesc': 'Hot',
'WeekDay': '4'}]}
The only query string parameter is the epoch timestamp which you can generate using the time lib:
from time import time
u = "http://www.hko.gov.hk/wxinfo/json/one_json.xml?_={}".format(int(time()))
data = requests.get(u).json()
Not passing the timestamp also returns the same data so I will leave you to investigate the significance.
Upvotes: 2
Reputation: 597
I was able to get the dates:
>>> import requests,bs4
>>> r = requests.get('http://www.hko.gov.hk/contente.htm')
>>> hkweather = bs4.BeautifulSoup(r.text)
>>> print hkweather.select('div[class="fnd_date"]')
# [<div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>]
But the text is missing. This doesn't seem like a problem with BeautifulSoup because I looked through r.text
myself and all I see is <div class="fnd_date"></div>
instead of anything like <div class="fnd_date">July 20</div>
.
You can check that the text isn't there using regex (although using regex with HTML is frowned upon):
>>> import re
>>> re.findall(r'<[^<>]*fnd_date[^<>]*>[^>]*>', r.text)
# [u'<div id="fnd_date" class="date"></div>', ... repeated 10 times]
Upvotes: 0