JHM69
JHM69

Reputation: 3

Get a JSON data using BeautifulSoup in Python Webscraping

I am trying to get a JSON object but getting an error, I am using BeautifulSoup. I can't remove "window.pageData= " to do that perfectly. Also got error using .replace method to replace "window.pageData=" but couldn't succeed. My Code:

    link = "https://www.daraz.com.bd/catalog/?q=" + "Pudina"
    r = requests.get(link)
    soup = BeautifulSoup(r.text, 'html.parser')
    all_scripts = soup.find_all('script')
    my_script=all_scripts[3]
    jsData = re.search(r'window.pageData=', my_script.text)
    data = json.loads(jsData.group(1))

Here is my Script

<script>window.pageData={
 "mods": {
   "listItems": [
     {
       "name": "Mint leaf Powder (পুদিনা পাতা গুড়া) (১০০গ্রাম)- Pudina Pata Gura",
       "nid": "125018674",
       "productUrl": "//www.daraz.com.bd/products/mint-leaf-powder-pudina-pata-gura-i125018674-s1045213986.html?search=1",
       "image": "https://static-01.daraz.com.bd/p/e742aabbea46336304f2081a29de1139.jpg",
       "originalPrice": "180.00",
       "originalPriceShow": "৳ 180",
       "price": "171",
     }
   ]
 }
}</script>

Upvotes: 0

Views: 662

Answers (1)

Yash
Yash

Reputation: 1281

I dont know what have you tried in .replace(), but this works for me.

import requests
from bs4 import BeautifulSoup
import re
import json
link = "https://www.daraz.com.bd/catalog/?q=" + "Pudina"
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')
all_scripts = soup.find_all('script')
my_script=all_scripts[3]
my_script = re.sub('window.pageData=', "",my_script.text)
#my_script=my_script.text.replace("window.pageData=","")
#print(my_script)
data = json.loads(my_script)
print(data)

Upvotes: 1

Related Questions