Reputation: 9235
I'm trying to scrape the contents of a javascript variable from a webpage. The webpage is a search page, and when you view its source, it has on the page something similar to
<script>var test1='balah';var catalog={};var test2='blah'</script>
Where catalog
is a large nested json structure string.
I know how to parse it, but how can I grab the json string from the webpage, assuming I already have the full page's html content in a single string variable?
Upvotes: 0
Views: 379
Reputation: 21
how about using a regular expression?
# -*- coding: utf-8 -*-
import re
content = "<script>var test1='balah';var catalog={'Year':'2019'};var test2='blah'</script>"
p = re.compile(r'[\d\D]+catalog=([\d\D]+?);')
m = p.match(content)
if m:
result = m.group(1)
print result
the result will be {'Year':'2019'}
Upvotes: 1