Reputation: 93
I have tried everything. The response is perfect and I do get what I am supposed to be getting, I just don't understand why I receive an empty array when I'm searching for a div with a specific class (that definitely exists) on the web page. I have tried looking everywhere, but nothing seems to work.
Here's my code:
import requests
import lxml
from bs4 import BeautifulSoup
baseurl = 'https://www.atea.dk/eshop/products/?filters=S_Apple%20MacBook%20Pro%2016%20GB'
response = requests.get(baseurl)
soup = BeautifulSoup(response.content, 'lxml')
productlist = soup.find_all("div", class_="nsv-product ns_b_a")
print(productlist)
I am essentially trying to build a script that emails me when the items from an e-shop are marked as available (pa lager) instead of unavailable (ikke pa lager).
Upvotes: 2
Views: 100
Reputation: 174
You might need to use Selenium on this one.
The div is, AFAIK, rendered by JS.
BeautifulSoup does not capture JS-rendered content.
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.common.keys import Keys
options = webdriver.FirefoxOptions()
options.headless = True
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install(),options=options)
driver.get('https://www.atea.dk/eshop/products/?filters=S_Apple%20MacBook%20Pro%2016%20GB')
k = driver.find_elements_by_xpath("//div[@class='nsv-product ns_b_a']")
Your code below that snippet should contains everything you need, e.g. processing, saving into your database, etc..
Note: That snippet is a bit flawed, e.g. you want to use Chrome, but it only provides an example, so tweak it to your own needs.
Upvotes: 4
Reputation: 1076
You need to inspect the page source code (for windows : Ctrl + U) and search for this section window.netset.model.productListDataModel
it's enclosed in <script>
tag.
What you need to do is to parse that enclosed json string
<script>window.netset.model.productListDataModel = {....}
which will provide your desired product listing (8 per page).
Here is the code
import re, json
import requests
import lxml
from bs4 import BeautifulSoup
baseurl = 'https://www.atea.dk/eshop/products/?filters=S_Apple%20MacBook%20Pro%2016%20GB'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}
response = requests.get(baseurl, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
# print response
print(response)
# regex to find the script enclosed json string
product_list_raw_str = re.findall(r'productListDataModel\s+=\s+(\{.*?\});\n', response.text)[0].strip()
# parse json string
products_json = json.loads(product_list_raw_str)
# find the product list
product_list = products_json['response']['productlistrows']['productrows']
# check product list count, 8 per page
len(product_list)
# iterate the product list
for product in product_list:
print(product)
It will output -
<Response [200]>
{'buyable': True, 'ispackage': False, 'artcolumndata': 'MK1E3DK/A', 'quantityDisabled': False, 'rowid': '5418526',.........
..... 'showAddToMyList': False, 'showAddToCompareList': True, 'showbuybutton': True}
Upvotes: 0