Filtered
Filtered

Reputation: 75

Trying to convert JSON to dictionary in python

I scraped a website for the application/ld+json and it returns json, and I want to convert the string to a python dictionary and it doesn't seem to be working. In the terminal i get the error JSONDecodeError("Expecting value", s, err.value) from None. I'm relatively new to working with JSON so I might have made a dumb mistake, but everything I found on stack overflow didn't work. Any help would be greatly appreciated, and thank you for taking the time to read my post!

Here is my code

from flask import Flask, render_template
from bs4 import BeautifulSoup
import requests
import json

source = requests.get('https://www.visionlearning.com/en/library/Chemistry/1/Nuclear-Chemistry/59').text
soup = BeautifulSoup(source, 'html.parser')

jsonString = str(soup.find_all('script', type='application/ld+json')[0])
print(json.loads(jsonString))

Upvotes: 0

Views: 234

Answers (4)

Filtered
Filtered

Reputation: 75

This is what finally worked I added .contents[0] to the end of jsonString

source = requests.get('https://www.visionlearning.com/en/library/Chemistry/1/Nuclear-Chemistry/59')
soup = BeautifulSoup(source.content, 'html.parser')

jsonString = soup.find_all('script', type='application/ld+json')[0].contents[0]
print(json.loads(jsonString))

Thank you for all the help though!

Upvotes: 0

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15578

Since you are getting the first value. You don’t have to use .find_all. .find will return the first value. Turn it to string with .get_text or .text then cast it to json.

from bs4 import BeautifulSoup
import requests
import json

source = requests.get('https://www.visionlearning.com/en/library/Chemistry/1/Nuclear-Chemistry/59').text
soup = BeautifulSoup(source, 'html.parser')

jsonString = soup.find('script', type='application/ld+json')

print(json.loads(jsonString.get_text(strip=True)))

Upvotes: 1

import requests
from bs4 import BeautifulSoup
import json


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = json.loads(soup.find("script").text)
    print(target.keys())


main("https://www.visionlearning.com/en/library/Chemistry/1/Nuclear-Chemistry/59")

Output:

dict_keys(['@context', '@type', 'mainEntityOfPage', 'name', 'headline', 'author', 'datePublished', 'dateModified', 'image', 'publisher', 'description', 'keywords', 'inLanguage', 'copyrightHolder', 'copyrightYear'])

Upvotes: 1

Frank
Frank

Reputation: 1285

If you print out jsonString you will see it includes the <script> tab, just get the inside content:

jsonString = str(soup.find_all('script', type='application/ld+json')[0].text)

Upvotes: 2

Related Questions