Reputation: 63
Good day, I have an issue fetching data from the following website:
http://weather.news24.com/sa/johannesburg
I have attempted using the python requests and urllib but with no success. By inspecting the page resources with chrome developertools, I found the following url containing the desire data, but still I'm not getting the data as json, as I'd like to get the low and high temp, sunrise, sunset.
And it appears to me that there's an ajax function which loads the data. I tried it with both so I can later use them in django. I'm using python 3. Any help will be appreciated.
Upvotes: 0
Views: 257
Reputation: 1832
Hope this helps:
import requests,re,json
from bs4 import BeautifulSoup
# This is your main url
main_url="http://weather.news24.com/sa/johannesburg"
# I am extracting city name from url. Not sure if you already have that somewhere
mycity=main_url.split('/')[-1]
# Calling your main_url
r=requests.get(main_url)
# Now The only valuable info you get on this request is the CityId for Johannesburg
# So lets grab it using BeautifulSoup
soup=BeautifulSoup(r.content)
# This gives me the list of all the cities on website and thier CityId
city_list=soup.find(id="ctl00_WeatherContentHolder_ddlCity")
# I am looking for city (johannesburg) within the city_list
# re.I in the code below is to ignoreCASE
city_as_on_website=city_list.find(text=re.compile(mycity,re.I)).parent
cityId=city_as_on_website['value']
# Now make a POST request to following url with following headers and data to get the JSON
json_url="http://weather.news24.com/ajaxpro/TwentyFour.Weather.Web.Ajax,App_Code.ashx"
headers={'Content-Type':'text/plain; charset=UTF-8',
'Host':'weather.news24.com',
'Origin':'http://weather.news24.com',
'Referer':main_url,
'User-Agent':'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36',
'X-AjaxPro-Method':'GetCurrentOne'}
payload={"cityId": cityId} # This is the cityId that we found above using BeautifulSoup
# Now send the POST request
r=requests.post(json_url,headers=headers,data=json.dumps(payload))
# r.content will sure give you the json data that you expect.
# However, the sad thing is that this one is not well formatted.
# And solving that will be completely another question on Stackoverflow
# Hope, you will fight your way to it.
# Good Luck! :-)
Out[1]: '{"__type":"TwentyFour.Services.Weather.Objects.CurrentOneReport, TwentyFour.Services.Weather, Version=1.2.0.0, Culture=neutral, PublicKeyToken=null","Observations":[{"__type":"TwentyFour.Services.Weather.Objects.Observation, TwentyFour.Services.Weather, Version=1.2.0.0, Culture=neutral, PublicKeyToken=null","CityName":"Lanseria Civ / Mil","Location":"Lanseria Civ / Mil","Sky":"Passing clouds","Temperature":"25.00","Humidity":"54","WindSpeed":"15","WindDirectionAbreviated":"SE","Comfort":"26","DewPoint":"15","Description":"Passing clouds. Warm.","Icon":"2","IconName"
...
...
":null,"Rainfall":"14mm","Snowfall":"*","PrecipitationProbability":"52","Icon":"22","IconName":"tstorms","Cached":false},"AstronomyReport":null,"MarineReport":null,"LocalTime":"Wed, 24 Feb 2016 17:30:27 SAST","LocalUpdateTime":"Wed, 24 Feb 2016 17:12:07 SAST","CountryName":"South Africa","TimeZone":"2","Cached":false};/*'
Upvotes: 1