Reputation: 32005
I want to load the list of images in this page in Python. However, when I opened the page in my browser (Chrome or Safari) and opened the dev tools, the inspector returned the list of images as <img class="grid-item--image">...
.
However, when I tried to parse it in Python, the result seemed different. Specifically, I got the list of images as <img class="carousel--image"...>
, whereas the soup.findAll("img", "grid-item--image")
did return an empty list. Also, I tried saving those images using its srcset
tag, most of the saved images are NOT those that were listed on the web.
I think the web page used some sort of technics when rendering. How can I parse the web pages successfully?
I used BeautifulSoup 4 on Python 3.5. I loaded the page as follows:
import requests
from bs4 import BeautifulSoup
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser", from_encoding="utf-8")
return soup
Upvotes: 0
Views: 67
Reputation: 46789
You would do better to use something like selenium
for this as follows:
from bs4 import BeautifulSoup
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.vogue.com/fashion-shows/fall-2016-menswear/fendi#collection")
html_source = browser.page_source
soup = BeautifulSoup(html_source, "html.parser")
for item in soup.find_all("img", {"class":"grid-item--image"}):
print(item.get('srcset'))
This would display the following kind of output:
http://assets.vogue.com/photos/569d37e434324c316bd70f04/master/w_195/_FEN0016.jpg
http://assets.vogue.com/photos/569d37e5d928983d20a78e4f/master/w_195/_FEN0027.jpg
http://assets.vogue.com/photos/569d37e834324c316bd70f0a/master/w_195/_FEN0041.jpg
http://assets.vogue.com/photos/569d37e334324c316bd70efe/master/w_195/_FEN0049.jpg
http://assets.vogue.com/photos/569d37e702e08d8957a11e32/master/w_195/_FEN0059.jpg
...
...
...
http://assets.vogue.com/photos/569d3836486d6d3e20ae9625/master/w_195/_FEN0616.jpg
http://assets.vogue.com/photos/569d381834324c316bd70f3b/master/w_195/_FEN0634.jpg
http://assets.vogue.com/photos/569d3829fa6d6c9057f91d2a/master/w_195/_FEN0649.jpg
http://assets.vogue.com/photos/569d382234324c316bd70f41/master/w_195/_FEN0663.jpg
http://assets.vogue.com/photos/569d382b7dcd2a8a57748d05/master/w_195/_FEN0678.jpg
http://assets.vogue.com/photos/569d381334324c316bd70f2f/master/w_195/_FEN0690.jpg
http://assets.vogue.com/photos/569d382dd928983d20a78eb1/master/w_195/_FEN0846.jpg
This allows the full rendering of the page to take place inside the browser, and the resulting HTML can then be obtained.
Upvotes: 1