Reputation: 23
I need to download full page and parse it, but it creates some elements with help JavaScript. When i try to do this with help urllib i receive an html page without elements using JavaScript. How can I solve this problem?
import urllib.request as urlib
page = urlib.urlopen('https://www.example.com')
soup = BeautifulSoup(page, 'html5lib')
...
Trying:
colordiv = soup.select("div.pswp__item:nth-child(1) > div:nth-child(1) > img:nth-child(1)'")[0]
With:
https://www.electrictobacconist.com/smok-nord-p5831
Upvotes: 2
Views: 1762
Reputation: 84465
You can use dev tools to find the request used to updated the values for colours
import requests
r = requests.get('https://www.electrictobacconist.com/ajax/get_product_options/5831').json()
colours = [item['value'] for item in r['attributes'][0]['values']]
print(colours)
Upvotes: 0
Reputation: 8205
Even though the page is rendered using JavaScript, the data is received via an ajax response in the background. All you have to do is make that request.
import requests
import re
url='https://www.electrictobacconist.com/smok-nord-p5831'
#get 5831
product_id=re.findall(r'\d+', url)[-1]
r=requests.get("https://www.electrictobacconist.com/ajax/get_product_options/{}".format(product_id))
print([x['value'] for x in r.json()['attributes'][0]['values']])
Output:
['Black/Blue', 'Black/White', 'Bottle Green', 'Full Black', 'Prism Gold', 'Prism Rainbow', 'Red', 'Resin Rainbow', 'Yellow/Purple', 'Blue/Brown', 'Red/Yellow', 'Red/Green', 'Black/White Resin']
Upvotes: 1