Reputation: 3
I am trying to get a JSON object but getting an error, I am using BeautifulSoup. I can't remove "window.pageData= " to do that perfectly. Also got error using .replace method to replace "window.pageData=" but couldn't succeed. My Code:
link = "https://www.daraz.com.bd/catalog/?q=" + "Pudina"
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')
all_scripts = soup.find_all('script')
my_script=all_scripts[3]
jsData = re.search(r'window.pageData=', my_script.text)
data = json.loads(jsData.group(1))
Here is my Script
<script>window.pageData={
"mods": {
"listItems": [
{
"name": "Mint leaf Powder (পুদিনা পাতা গুড়া) (১০০গ্রাম)- Pudina Pata Gura",
"nid": "125018674",
"productUrl": "//www.daraz.com.bd/products/mint-leaf-powder-pudina-pata-gura-i125018674-s1045213986.html?search=1",
"image": "https://static-01.daraz.com.bd/p/e742aabbea46336304f2081a29de1139.jpg",
"originalPrice": "180.00",
"originalPriceShow": "৳ 180",
"price": "171",
}
]
}
}</script>
Upvotes: 0
Views: 662
Reputation: 1281
I dont know what have you tried in .replace()
, but this works for me.
import requests
from bs4 import BeautifulSoup
import re
import json
link = "https://www.daraz.com.bd/catalog/?q=" + "Pudina"
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')
all_scripts = soup.find_all('script')
my_script=all_scripts[3]
my_script = re.sub('window.pageData=', "",my_script.text)
#my_script=my_script.text.replace("window.pageData=","")
#print(my_script)
data = json.loads(my_script)
print(data)
Upvotes: 1