user3257907
user3257907

Reputation: 23

How to download a page with lazy loading?

I need to download full page and parse it, but it creates some elements with help JavaScript. When i try to do this with help urllib i receive an html page without elements using JavaScript. How can I solve this problem?

import urllib.request as urlib

page = urlib.urlopen('https://www.example.com')
soup = BeautifulSoup(page, 'html5lib')
...

Trying:

colordiv = soup.select("div.pswp__item:nth-child(1) > div:nth-child(1) > img:nth-child(1)'")[0]

With:

https://www.electrictobacconist.com/smok-nord-p5831

Upvotes: 2

Views: 1762

Answers (2)

QHarr
QHarr

Reputation: 84465

You can use dev tools to find the request used to updated the values for colours

import requests

r = requests.get('https://www.electrictobacconist.com/ajax/get_product_options/5831').json()
colours = [item['value'] for item in r['attributes'][0]['values']]
print(colours)

enter image description here

Upvotes: 0

Bitto
Bitto

Reputation: 8205

Even though the page is rendered using JavaScript, the data is received via an ajax response in the background. All you have to do is make that request.

import requests
import re
url='https://www.electrictobacconist.com/smok-nord-p5831'
#get 5831
product_id=re.findall(r'\d+', url)[-1]
r=requests.get("https://www.electrictobacconist.com/ajax/get_product_options/{}".format(product_id))
print([x['value'] for x in r.json()['attributes'][0]['values']])

Output:

['Black/Blue', 'Black/White', 'Bottle Green', 'Full Black', 'Prism Gold', 'Prism Rainbow', 'Red', 'Resin Rainbow', 'Yellow/Purple', 'Blue/Brown', 'Red/Yellow', 'Red/Green', 'Black/White Resin']

Upvotes: 1

Related Questions