web scraping parsing nested json and creating a list

Question

I'm trying to web scrape an ecommerce website. However, the page is dynamic. Within the html source code is the script that generates a json format of the products.

My code is

from bs4 import BeautifulSoup, SoupStrainer
import requests
import json

url = "https://www.lazada.com.ph/chuwi-pilipinas/?q=All-Products&langFlag=en&from=wangpu&lang=en&pageTypeId=2"

page = requests.get(url)    
data = page.text
soup = BeautifulSoup(data,'html.parser')


scripts = soup.find_all('script')

jsonObj = None
for script in scripts:
    if 'window.pageData = ' in script.text:
        jsonStr = script.text
        jsonStr = jsonStr.split('window.pageData = ')[1]
        jsonObj = json.loads(jsonStr)
        
products = jsonObj['mods']['listItems']

for item in products:
    print (item['productUrl'])

the result is:

PS C:\Users
ate\Documents\Python\LazadaScapper> & "C:/Program Files/Python39/python.exe" c:/Users/nate/Documents/Python/LazadaScapper/LazadaScraper3.py
Traceback (most recent call last):
  File "c:\Users
ate\Documents\Python\LazadaScapper\LazadaScraper3.py", line 21, in 
    products = jsonObj['mods']['listItems']
TypeError: 'NoneType' object is not subscriptable
PS C:\Users
ate\Documents\Python\LazadaScapper>

I did a research and it seems that for loop doesn't work thus, dictionary products is empty.

This is related to this thread that was posted 2 years ago but not working anymore.

I'm new at python and still studying, I hope you guys can help me.

Andrej Kesely · Accepted Answer

The issue is beautifulsoup doesn't parse the content of

web scraping parsing nested json and creating a list

Answers (1)

Related Questions