Coozgan
Coozgan

Reputation: 393

Getting the json data from the website using beautifulsoup

Sorry I am a bit new to this so I would like to get a certain json data "getMe":"IneedThisData"

from bs4 import BeautifulSoup
import json

html_doc = """
<!DOCTYPE html>
<html>
<head>
    <title>Sample</title>
</head>
<body>
<script type="text/javascript">utag_cfg_ovrd = window.utag_cfg_ovrd || {};utag_cfg_ovrd.noview = true;
</script>
<script async="" src="/assets/AppMeasurement.js">
</script>
<script>
    window.REDUX_STATE = {"appConfig":
    {"dataLab":"energy","minimum":"maximum","getMe":"IneedThisData"}}
</script>

</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
data = json.loads(soup.find('script', 'window.REDUX_STATE').text)

I getting an error of AttributeError: 'NoneType' object has no attribute 'text' Im still stuck in loading that data into a variable.

Upvotes: 0

Views: 160

Answers (1)

narendra-choudhary
narendra-choudhary

Reputation: 4818

Assuming "minimum":"maximum":"getMe" is a typo, and actually is "minimum":"maximum","getMe" without typo (that makes it a proper JSON), you can use following code:

soup = BeautifulSoup(html_doc, 'html.parser')
tag = soup.find("script", text=re.compile(".*window\.REDUX_STATE.*"))
text = str(tag.contents[0])
splits = text.split("=")
data = json.loads(splits[1])

In your code, soup.find('script', 'window.REDUX_STATE') does not match any tag. That is the reason you are getting AttributeError error.
attrs attribute of find is used to filter tags based on their attribute. "window.REDUX_STATE" is not an attribute.

Upvotes: 1

Related Questions