Reputation: 35
i'm trying to parse the following JSON but I always face the error stating "JSONDecodeError: Expecting ',' delimiter"
Here is the code i'm doing:
import requests
from bs4 import BeautifulSoup
import json
page_link="https://www.indeed.com/cmp/Ocean-Beauty-Seafoods/reviews?start=0"
page_response = requests.get(page_link, verify=False)
soup = BeautifulSoup(page_response.content, 'html.parser')
strJson=soup.findAll('script')[16].text.replace("\n window._initialData=JSON.parse(\'","").replace("');","")
json.loads(strJson)
manyy thanks
Upvotes: 0
Views: 635
Reputation: 195528
The Json as it is isn't valid. Try to "preprocess" it first with ast.literal_eval
:
import json
import requests
from ast import literal_eval
from bs4 import BeautifulSoup
page_link = "https://www.indeed.com/cmp/Ocean-Beauty-Seafoods/reviews?start=0"
page_response = requests.get(page_link, verify=False)
soup = BeautifulSoup(page_response.content, "html.parser")
strJson = (
soup.findAll("script")[16]
.text.replace("\n window._initialData=JSON.parse('", "")
.replace("');", "")
)
s = literal_eval("'''" + strJson + "'''")
data = json.loads(s)
print(json.dumps(data, indent=4))
Prints:
{
"breadcrumbs": {
"breadcrumbs": [
{
"name": "Companies",
"noFollow": false,
"url": "https://www.indeed.com/companies"
},
{
"name": "Ocean Beauty Seafoods",
"noFollow": false,
"url": "https://www.indeed.com/cmp/Ocean-Beauty-Seafoods"
},
{
"name": "Employee Reviews",
"noFollow": false
}
]
},
"companyPageFooter": {
"enabledToShowUserFeedbackForm": false,
"encodedFccId": "a9c95405fb0cdb1c",
"stickyJobsTabLink": {
"jobsLink": "/cmp/Ocean-Beauty-Seafoods/jobs"
}
},
"companyPageHeader": {
"auroraLogoUrl": "https://d2q79iu7y748jz.cloudfront.net/s/_squarelogo/64x64/147cafc3914ffb4693dc99df6ad0b169",
"auroraLogoUrl2x": "https://d2q79iu7y748jz.cloudfront.net/s/_squarelogo/128x128/147cafc3914ffb4693dc99df6ad0b169",
"brandColor": "#FFFFFF",
"companyHeader": {
"name": "Ocean Beauty Seafoods",
"rating": 3.7,
"reviewCount": 114,
"reviewCountFormatted": "114",
"reviewsUrl": "/cmp/Ocean-Beauty-Seafoods/reviews"
},
...and so on.
Upvotes: 2