Reputation: 91
I want to return the "id" value from the variable meta using beautifulsoup and python. This possible? Additionally, I don't know how to find the certain 'script' tag that contains the meta variable because it does not have a unique identifier, as well as many other 'script' tags on the site. I'm also using selenium as well, so I can understand any answers with that.
<script>
var meta = "variants":[{"id":12443604615241,"price":14000},
{"id":12443604648009,"price":14000}]
</script>
Upvotes: 1
Views: 8724
Reputation: 195428
You can use builtin re
and json
module for extracting Javascript variables:
from bs4 import BeautifulSoup
import re
import json
from pprint import pprint
data = '''
<html>
<body>
<script>
var meta = "variants":[{"id":12443604615241,"price":14000},
{"id":12443604648009,"price":14000}]
</script>
</body>
'''
soup = BeautifulSoup(data, 'lxml')
json_string = re.search(r'meta\s*=\s*(.*?}])\s*\n', str(soup.find('script')), flags=re.DOTALL)
json_data = json.loads('{' + json_string[1] + '}')
pprint(json_data)
This prints:
{'variants': [{'id': 12443604615241, 'price': 14000},
{'id': 12443604648009, 'price': 14000}]}
Upvotes: 3
Reputation: 2033
If you are using selenium there's no need to parse the html to get the js variable, just use selenum webdriver.execute_script()
to get it to python:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://whatever.com/')
meta = driver.execute_script('return meta')
And thats it, meta now holds the js variable, and it maintains its type
Upvotes: 8