Reputation: 393
Sorry I am a bit new to this so I would like to get a certain json data "getMe":"IneedThisData"
from bs4 import BeautifulSoup
import json
html_doc = """
<!DOCTYPE html>
<html>
<head>
<title>Sample</title>
</head>
<body>
<script type="text/javascript">utag_cfg_ovrd = window.utag_cfg_ovrd || {};utag_cfg_ovrd.noview = true;
</script>
<script async="" src="/assets/AppMeasurement.js">
</script>
<script>
window.REDUX_STATE = {"appConfig":
{"dataLab":"energy","minimum":"maximum","getMe":"IneedThisData"}}
</script>
</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
data = json.loads(soup.find('script', 'window.REDUX_STATE').text)
I getting an error of AttributeError: 'NoneType' object has no attribute 'text'
Im still stuck in loading that data into a variable.
Upvotes: 0
Views: 160
Reputation: 4818
Assuming "minimum":"maximum":"getMe"
is a typo, and actually is "minimum":"maximum","getMe"
without typo (that makes it a proper JSON), you can use following code:
soup = BeautifulSoup(html_doc, 'html.parser')
tag = soup.find("script", text=re.compile(".*window\.REDUX_STATE.*"))
text = str(tag.contents[0])
splits = text.split("=")
data = json.loads(splits[1])
In your code, soup.find('script', 'window.REDUX_STATE')
does not match any tag. That is the reason you are getting AttributeError
error.
attrs
attribute of find
is used to filter tags based on their attribute. "window.REDUX_STATE" is not an attribute.
Upvotes: 1