Salim Chorfi
Salim Chorfi

Reputation: 15

BeautifulSoup for Div inside div(s) in python

I looked at different post with similar question but i'm not able to find the particular value i'm looking for.

I'm using this code:

import bs4 as bs
import urllib2

response = urllib2.urlopen('https://www.meteomedia.com/ca/meteo/quebec/montreal?wx_auto_reload=')
html = response.read()
soup = bs.BeautifulSoup(html, 'lxml')


for div in soup.find_all('div', id="main-container"):
    print (div.get_text())

I'm not able to find this particular line (The one highlighted): https://i.sstatic.net/OIlrc.png

I know I could use an API, but i'm trying to understand how web scraping works for future project. Thank you!!

Upvotes: 1

Views: 689

Answers (2)

Martin Evans
Martin Evans

Reputation: 46779

The website you have chosen probably creates the majority of its content using javascript, as such using a simple Python request will not give you all of the final HTML. A workaround would be to use something like selenium to remote control a browser and let the browser render the HTML. Python can then extract the final HTML via selenium.

As already mentioned, in this case it would make more sense to extract the information using the API that is being used, for example:

import bs4 as bs
import urllib2
import json

response = urllib2.urlopen('https://www.meteomedia.com/api/data/caqc0363/cm?ts=1012')
json_response = json.loads(response.read())
print json_response['obs']['t']

This would display the current temperature as:

-10

If you print json_response you will be able to see all of the information available that could be used.

Upvotes: 1

宏杰李
宏杰李

Reputation: 12168

Open Chrome development Tools, switch to NetWork tab, refresh the page:

enter image description here

you can find the data link in the XHR tab, then use Python to make request to it.

Upvotes: 0

Related Questions