Reputation: 15
I looked at different post with similar question but i'm not able to find the particular value i'm looking for.
I'm using this code:
import bs4 as bs
import urllib2
response = urllib2.urlopen('https://www.meteomedia.com/ca/meteo/quebec/montreal?wx_auto_reload=')
html = response.read()
soup = bs.BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', id="main-container"):
print (div.get_text())
I'm not able to find this particular line (The one highlighted): https://i.sstatic.net/OIlrc.png
I know I could use an API, but i'm trying to understand how web scraping works for future project. Thank you!!
Upvotes: 1
Views: 689
Reputation: 46779
The website you have chosen probably creates the majority of its content using javascript, as such using a simple Python request will not give you all of the final HTML. A workaround would be to use something like selenium
to remote control a browser and let the browser render the HTML. Python can then extract the final HTML via selenium.
As already mentioned, in this case it would make more sense to extract the information using the API that is being used, for example:
import bs4 as bs
import urllib2
import json
response = urllib2.urlopen('https://www.meteomedia.com/api/data/caqc0363/cm?ts=1012')
json_response = json.loads(response.read())
print json_response['obs']['t']
This would display the current temperature as:
-10
If you print json_response
you will be able to see all of the information available that could be used.
Upvotes: 1
Reputation: 12168
Open Chrome development Tools
, switch to NetWork tab
, refresh the page:
you can find the data link in the XHR
tab, then use Python to make request to it.
Upvotes: 0