Harry
Harry

Reputation: 93

Soup does not find specific class from div

I have tried everything. The response is perfect and I do get what I am supposed to be getting, I just don't understand why I receive an empty array when I'm searching for a div with a specific class (that definitely exists) on the web page. I have tried looking everywhere, but nothing seems to work.

Here's my code:

import requests
import lxml
from bs4 import BeautifulSoup

baseurl = 'https://www.atea.dk/eshop/products/?filters=S_Apple%20MacBook%20Pro%2016%20GB'

response = requests.get(baseurl)
soup = BeautifulSoup(response.content, 'lxml')

productlist = soup.find_all("div", class_="nsv-product ns_b_a")

print(productlist)

I am essentially trying to build a script that emails me when the items from an e-shop are marked as available (pa lager) instead of unavailable (ikke pa lager).

Upvotes: 2

Views: 100

Answers (2)

khgb
khgb

Reputation: 174

You might need to use Selenium on this one.

The div is, AFAIK, rendered by JS.

BeautifulSoup does not capture JS-rendered content.

from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.common.keys import Keys
options = webdriver.FirefoxOptions()
options.headless = True
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install(),options=options)
driver.get('https://www.atea.dk/eshop/products/?filters=S_Apple%20MacBook%20Pro%2016%20GB')
k = driver.find_elements_by_xpath("//div[@class='nsv-product ns_b_a']")

Your code below that snippet should contains everything you need, e.g. processing, saving into your database, etc..

Note: That snippet is a bit flawed, e.g. you want to use Chrome, but it only provides an example, so tweak it to your own needs.

Upvotes: 4

ahmedshahriar
ahmedshahriar

Reputation: 1076

You need to inspect the page source code (for windows : Ctrl + U) and search for this section window.netset.model.productListDataModel

it's enclosed in <script> tag.

What you need to do is to parse that enclosed json string <script>window.netset.model.productListDataModel = {....}
which will provide your desired product listing (8 per page).

Here is the code

import re, json
import requests
import lxml
from bs4 import BeautifulSoup

baseurl = 'https://www.atea.dk/eshop/products/?filters=S_Apple%20MacBook%20Pro%2016%20GB'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}
response = requests.get(baseurl, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')

# print response
print(response)

# regex to find the script enclosed json string
product_list_raw_str = re.findall(r'productListDataModel\s+=\s+(\{.*?\});\n', response.text)[0].strip()

# parse json string
products_json = json.loads(product_list_raw_str)

# find the product list
product_list = products_json['response']['productlistrows']['productrows']

# check product list count, 8 per page
len(product_list)

# iterate the product list
for product in product_list:
  print(product)

It will output -

<Response [200]>
{'buyable': True, 'ispackage': False, 'artcolumndata': 'MK1E3DK/A', 'quantityDisabled': False, 'rowid': '5418526',.........
..... 'showAddToMyList': False, 'showAddToCompareList': True, 'showbuybutton': True}

Upvotes: 0

Related Questions