Reputation: 635
I am trying to get hlsUrl from this tag.
import json
import re
from bs4 import BeautifulSoup
html_doc = """
<script>
var modelData = {
"hlsUrl": "null",
"account": "1V2FO4K7ME78RV09VXNEC",
"packageName": "null",
isActive: false
}
</script>
"""
soup = BeautifulSoup(html_doc, "html.parser")
script_text = soup.select_one("script").string
model_data = re.search(r"modelData = ({.*?})", script_text, re.S).group(1)
print(json.loads(model_data)["account"])
But I am getting this error:
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 5 column 9 (char 111)
I know it's because it is not valid json because of the isActive: false
How do I turn it into valid json or otherwise to get hlsUrl?
Upvotes: 0
Views: 45
Reputation: 195553
import json
import re
from bs4 import BeautifulSoup
html_doc = """
<script>
var modelData = {
"hlsUrl": "null",
"account": "1V2FO4K7ME78RV09VXNEC",
"packageName": "null",
isActive: false
}
</script>
"""
soup = BeautifulSoup(html_doc, "html.parser")
script_text = soup.select_one("script").string
model_data = re.search(r"modelData = ({.*?})", script_text, re.S).group(1)
model_data = re.sub(r"^\s*([^\s\"]+):", r'"\1":', model_data, flags=re.M) # <-- fix `isActive:` to `"isActive":`
print(json.loads(model_data)["account"])
Prints:
1V2FO4K7ME78RV09VXNEC
Upvotes: 1