Can't dig out nicely formatted json content out of some messy script

Question

I've written a script in python using requests module along with BeautifulSoup libary and re module to scoop a script in which nicely formatted json content is available. The thing is I like to use re to stand out that very portion out of the total messy script.

That script is within source code containing var masterCompanyData =.

Website link

This is how the script with json content looks like (can be seen executing the following script):

import re
import requests
from bs4 import BeautifulSoup

url = 'https://conference.iste.org/2019/exhibitors/floorplan.php'

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
script = soup.select_one("script:contains('masterCompanyData')").text
# p = re.compile(r'masterCompanyData = (.*);')
# jsonContent = p.findall(script)
# print(jsonContent)
print(script)

String manipulation that helped me scoop that:

items = soup.select_one("script:contains('masterCompanyData = ')").text.split("masterCompanyData = ")[1].split("Holder for the current zoom value")[0].split("/**")[0].replace(";","").strip()

As I've successfully dug out that portion using string manipulation, I don't wish to go that way; rather, I like to extract that json content using regex but I get empty list.

How can I get that json content using regex?

QHarr · Accepted Answer

Try the following regex

import requests
import re
import json

r = requests.get('https://conference.iste.org/2019/exhibitors/floorplan.php')
p1 = re.compile(r'var masterCompanyData = (.*?);


', re.DOTALL)
item = p1.findall(r.text)[0]
data = json.loads(item)

Using your idea:

import requests
import re
import json
from bs4 import BeautifulSoup as bs

r = requests.get('https://conference.iste.org/2019/exhibitors/floorplan.php')
p1 = re.compile(r'var masterCompanyData = (.*?);


', re.DOTALL)
soup = bs(r.content, 'lxml')
script = soup.select_one("script:contains('masterCompanyData')").text
string = p1.findall(script)[0]
x = json.loads(string)

Can't dig out nicely formatted json content out of some messy script

Answers (2)

Related Questions

Can&#39;t dig out nicely formatted json content out of some messy script

Answers (2)

Related Questions

Can't dig out nicely formatted json content out of some messy script