cclloyd
cclloyd

Reputation: 9235

Python Scrape JSON from webpage

I'm trying to scrape the contents of a javascript variable from a webpage. The webpage is a search page, and when you view its source, it has on the page something similar to

<script>var test1='balah';var catalog={};var test2='blah'</script>

Where catalog is a large nested json structure string.

I know how to parse it, but how can I grab the json string from the webpage, assuming I already have the full page's html content in a single string variable?

Upvotes: 0

Views: 379

Answers (1)

Derobukal
Derobukal

Reputation: 21

how about using a regular expression?

# -*- coding: utf-8 -*-
import re

content = "<script>var test1='balah';var catalog={'Year':'2019'};var test2='blah'</script>"
p = re.compile(r'[\d\D]+catalog=([\d\D]+?);')
m = p.match(content)
if m:
    result = m.group(1)
    print result

the result will be {'Year':'2019'}

Upvotes: 1

Related Questions