Python3 scraping a bunch of javascript variables from webpage into a python dict object

Question

I'm using requests and BeautifulSoup4 to download and scrape information from a webpage, I have it successfully narrowing down to everything inside of a particular

furas · Accepted Answer

JavaScript data mostly is in JSON format so you can use python module json to convert it to pythons dictionary.

As example data after "videos[0] = " creates correct JSON data and you can use data = json.loads(stringg)to create dictionary - and then you can get ie. data['wmv']['size']

data = '''{
    "wmv": {
        "file": "wmv/01.wmv",
        "name": "01",
        "duration": 502,
        "size": "195.1MB",
        "wid": 854,
        "hgt": 480,
        "st": "1557499029",
        "et": "1557502629",
        "hs": "a0cfdef3b8b9e3dea576368a5bfbaef9",
        "caps": []
    },
    "h264": {
        "file": "h264/01.mp4",
        "name": "01",
        "duration": 502,
        "size": "73.9MB",
        "wid": 854,
        "hgt": 480,
        "st": "1557499029",
        "et": "1557502629",
        "hs": "32901a1870d0b32458b465ac9c3d6cad",
        "caps": [{
            "file": "001.jpg",
            "fs": {
                "st": "1557499029",
                "et": "1557502629",
                "hs": "5b328642a84fa6406bda527c18e46c27"
            },
            "tn": {
                "st": "1557499029",
                "et": "1557502629",
                "hs": "0a4ad7d0edf1b92538b8127f8e297c41"
            }
        }, {
            "file": "002.jpg",
            "fs": {
                "st": "1557499029",
                "et": "1557502629",
                "hs": "4390c0d9b321b5e86c88cb8ca5e56ede"
            },
            "tn": {
                "st": "1557499029",
                "et": "1557502629",
                "hs": "9cf83158268379df660d6d01750a047c"
            }
        }]
    }
}'''

import json

data = json.loads(data)

print(data['wmv']['size'])

# 195.1MB

If every variable is one line then you can use split(' ') to get lines and then use split('=') to get key and value.

Then you have to only check if value starts with { or [ to use json. Other values can be normal string so they don't need json - it may need only to remove ".

Content = '''//

Python3 scraping a bunch of javascript variables from webpage into a python dict object

Answers (1)

Related Questions