No Data in JSON Array - BeautifulSoup and Python 3

Question

The script I have below keeps returning an empty array when I try to write the contents to a JSON file. There are no errors that pop up when the script in run. It does not print anything in the terminal either. I have some similar scripts for other websites that are working perfectly. Here is my code. Thanks in advance.

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
import json

openstax = 'https://cnx.org'

#opening up connection and grabbing page
uClient = urlopen(openstax)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

#grabs info for each textbook
containers = page_soup.findAll("div",{"class":"book"})

data = []
for container in containers:
   item = {}
   item['type'] = "Textbook"
   item['title'] = container.h3.a.text
   data.append(item)
   print(item['title']) 

with open("./json/openstax.json", "w") as writeJSON:
    json.dump(data, writeJSON, ensure_ascii=False)

dethos · Accepted Answer

The page you are fetching (defined in the openstax variable) is generated on the client side using javascript. So the final html isn't present on the response to the request you make using your code.

Because of this, when you search page_soup.findAll("div",{"class":"book"}), it isn't returning any elements, which in turn explains the json file being an empty array.

As it is stated on the returned html of that page, in the noscript element, you should try using the http://legacy.cnx.org/content url if you don't want use the javascript rendered webpage.

No Data in JSON Array - BeautifulSoup and Python 3

Answers (2)

Related Questions