Reputation: 63
I need to get the json that its inside of a java script var, and that var it inside a script tag like this
<script type="text/javascript">
But its not the only type="text/javascript" in the HTML, the tag its the netx:
<script type="text/javascript">
var products = '{"id":"000000000000193758","name":"FZ1292","estilo":"FZ1292","date":"44348","month":"JUNIO","day":"1","regster":"","price":"3499","name2":"McDonalds x Harden Vol. 5 ","description":"Detén tu hambre por jugar básquetbol ","image":"www.somesite.com":"ADIDAS","realdate":"06-01-2021"}
</script>
I have tried the following:
script = soup.find_all('script', {'type': 'text/javascript'})
But it brings all the matching tags, I don't know how to identify the specific tag, because it has no id
Upvotes: 1
Views: 278
Reputation: 195553
beautifulsoup
cannot parse javascript, but you can use re
/json
module to parse the data. For example:
import re
import json
html_doc = """
<script type="text/javascript">
var products = '{"id":"000000000000193758","name":"FZ1292","estilo":"FZ1292","date":"44348","month":"JUNIO","day":"1","regster":"","price":"3499","name2":"McDonalds x Harden Vol. 5 ","description":"Detén tu hambre por jugar básquetbol ","image":"www.somesite.com/ADIDAS","realdate":"06-01-2021"}'
</script>
"""
products = re.search(r"products = '(.*)'", html_doc).group(1)
products = json.loads(products)
# pretty print the data:
print(json.dumps(products, indent=4))
Prints:
{
"id": "000000000000193758",
"name": "FZ1292",
"estilo": "FZ1292",
"date": "44348",
"month": "JUNIO",
"day": "1",
"regster": "",
"price": "3499",
"name2": "McDonalds x Harden Vol. 5 ",
"description": "Det\u00e9n tu hambre por jugar b\u00e1squetbol ",
"image": "www.somesite.com/ADIDAS",
"realdate": "06-01-2021"
}
Upvotes: 2