Mustard Tiger
Mustard Tiger

Reputation: 3671

Python not reading valid JSON

I am scraping some HTML source from a web page to extract data stored in a json format

This is the Code:

url = 'https://finance.yahoo.com/quote/SPY'
result = requests.get(url)

c = result.content
html = BeautifulSoup(c, 'html.parser')
scripts = html.find_all('script')

sl =[]
for s in scripts:

     sl.append(s)

s = (sl[-3])
s = s.contents
s = str(s)
s = s[119:-16]

json_data = json.loads(s)

Running the above throws this error:

json.decoder.JSONDecodError: Expecting ',' delimiter: line 1 column 7506 (char7505)

When I take the content of variable s and pass it to a json formatter it's recognized as proper json.

I used the following web site to check the json: http://jsonprettyprint.com/json-pretty-printer.php

Why is this error coming up when using json.loads() in Python? I am assuming it has something to do with the string not being encoded properly or the presence of escape characters?

How do I solve this?

Upvotes: 3

Views: 1517

Answers (4)

ap288
ap288

Reputation: 41

json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 7484 (char 7483)

Using the failure message, you can print a slice of the string to see where it is failing.

print(s[7400:7500])
mailboxes.isPrimary=\\"true\\" AND ymreq

As skaul05 stated, it is failing because of the true token in the string.

Upvotes: 1

Lawrence Khan
Lawrence Khan

Reputation: 62

If it was a valid JSON formatted text then the parser wouldn't complain. This is how I tested it

//first I scraped that page
curl https://finance.yahoo.com/quote/SPY > SPY.json
//then tried to parse it using json
a = open("SPY.json")
b = json.load(a)
ValueError: No JSON object could be decoded

You probably need to first parse it in to valid xml first.

Upvotes: -1

Tom.chen.kang
Tom.chen.kang

Reputation: 183

import requests
from bs4 import BeautifulSoup
import json

url = 'https://finance.yahoo.com/quote/SPY'
result = requests.get(url)

c = result.content
html = BeautifulSoup(c, 'html.parser')
scripts = html.find_all('script')

sl =[]
for s in scripts:

     sl.append(s)

s = (sl[-3])
s = s.contents

a = s[0][111:-12]

jjjj = json.loads(a)

there's somethingrong when you deal with the list,you just use str()

Upvotes: -1

skaul05
skaul05

Reputation: 2334

Your JSON contains certain unexpected tokens like true. Use json.dumps first to resolve it.

print (json.dumps(s,indent =2))
s = json.dumps(s)
json_data = json.loads(s)

Upvotes: 2

Related Questions