Jesus Torres
Jesus Torres

Reputation: 63

Get json inside a javascript var and inside jscript tag Python BS4

I need to get the json that its inside of a java script var, and that var it inside a script tag like this

<script type="text/javascript">

But its not the only type="text/javascript" in the HTML, the tag its the netx:

    <script type="text/javascript">
        var products = '{"id":"000000000000193758","name":"FZ1292","estilo":"FZ1292","date":"44348","month":"JUNIO","day":"1","regster":"","price":"3499","name2":"McDonalds x Harden Vol. 5 ","description":"Detén tu hambre por jugar básquetbol ","image":"www.somesite.com":"ADIDAS","realdate":"06-01-2021"}  
</script>

I have tried the following:

script = soup.find_all('script', {'type': 'text/javascript'})

But it brings all the matching tags, I don't know how to identify the specific tag, because it has no id

Upvotes: 1

Views: 278

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195553

beautifulsoup cannot parse javascript, but you can use re/json module to parse the data. For example:

import re
import json

html_doc = """
    <script type="text/javascript">
        var products = '{"id":"000000000000193758","name":"FZ1292","estilo":"FZ1292","date":"44348","month":"JUNIO","day":"1","regster":"","price":"3499","name2":"McDonalds x Harden Vol. 5 ","description":"Detén tu hambre por jugar básquetbol ","image":"www.somesite.com/ADIDAS","realdate":"06-01-2021"}'
</script>
"""

products = re.search(r"products = '(.*)'", html_doc).group(1)
products = json.loads(products)

# pretty print the data:
print(json.dumps(products, indent=4))

Prints:

{
    "id": "000000000000193758",
    "name": "FZ1292",
    "estilo": "FZ1292",
    "date": "44348",
    "month": "JUNIO",
    "day": "1",
    "regster": "",
    "price": "3499",
    "name2": "McDonalds x Harden Vol. 5 ",
    "description": "Det\u00e9n tu hambre por jugar b\u00e1squetbol ",
    "image": "www.somesite.com/ADIDAS",
    "realdate": "06-01-2021"
}

Upvotes: 2

Related Questions