oli
oli

Reputation: 35

JSONDecodeError: Expecting ',' delimiter in a long JSON string

i'm trying to parse the following JSON but I always face the error stating "JSONDecodeError: Expecting ',' delimiter"

Here is the code i'm doing:

import requests
from bs4 import  BeautifulSoup
import json

page_link="https://www.indeed.com/cmp/Ocean-Beauty-Seafoods/reviews?start=0"
page_response = requests.get(page_link, verify=False)
soup = BeautifulSoup(page_response.content, 'html.parser')
strJson=soup.findAll('script')[16].text.replace("\n    window._initialData=JSON.parse(\'","").replace("');","")
json.loads(strJson)

manyy thanks

Upvotes: 0

Views: 635

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195528

The Json as it is isn't valid. Try to "preprocess" it first with ast.literal_eval:

import json
import requests
from ast import literal_eval
from bs4 import BeautifulSoup


page_link = "https://www.indeed.com/cmp/Ocean-Beauty-Seafoods/reviews?start=0"
page_response = requests.get(page_link, verify=False)
soup = BeautifulSoup(page_response.content, "html.parser")

strJson = (
    soup.findAll("script")[16]
    .text.replace("\n    window._initialData=JSON.parse('", "")
    .replace("');", "")
)

s = literal_eval("'''" + strJson + "'''")
data = json.loads(s)

print(json.dumps(data, indent=4))

Prints:

{
    "breadcrumbs": {
        "breadcrumbs": [
            {
                "name": "Companies",
                "noFollow": false,
                "url": "https://www.indeed.com/companies"
            },
            {
                "name": "Ocean Beauty Seafoods",
                "noFollow": false,
                "url": "https://www.indeed.com/cmp/Ocean-Beauty-Seafoods"
            },
            {
                "name": "Employee Reviews",
                "noFollow": false
            }
        ]
    },
    "companyPageFooter": {
        "enabledToShowUserFeedbackForm": false,
        "encodedFccId": "a9c95405fb0cdb1c",
        "stickyJobsTabLink": {
            "jobsLink": "/cmp/Ocean-Beauty-Seafoods/jobs"
        }
    },
    "companyPageHeader": {
        "auroraLogoUrl": "https://d2q79iu7y748jz.cloudfront.net/s/_squarelogo/64x64/147cafc3914ffb4693dc99df6ad0b169",
        "auroraLogoUrl2x": "https://d2q79iu7y748jz.cloudfront.net/s/_squarelogo/128x128/147cafc3914ffb4693dc99df6ad0b169",
        "brandColor": "#FFFFFF",
        "companyHeader": {
            "name": "Ocean Beauty Seafoods",
            "rating": 3.7,
            "reviewCount": 114,
            "reviewCountFormatted": "114",
            "reviewsUrl": "/cmp/Ocean-Beauty-Seafoods/reviews"
        },

...and so on.

Upvotes: 2

Related Questions