Reputation: 3085
I want to extract json
content within the html comment tag using BeautifulSoup.
<script data_id ="dfsfre2323" data_key="23424sfsfsfdafd", type="application/json"><!--
{"employee": {"name":"sonoo", "salary":56000, "married":true}}--></script>]
The output should be as follows
Name: sonoo
Salary: 56000
Married: True
I have tried the following:
from bs4 import BeautifulSoup, Comment
import json
soup = BeautifulSoup(webpage, "html.parser")
data = soup.find("script", {"type":"application/json", data_id ="dfsfre2323" data_key="23424sfsfsfdafd"})
comment = soup.find(text=lambda text:isinstance(data, Comment))
I don't get nothing in the comment.
Any help appreciated in advance?
Upvotes: 3
Views: 90
Reputation: 195418
The content inside <script>
tag isn't parsed by BeautifulSoup, so your .find(text=...)
won't find anything. Convert the script string to BeautifulSoup before .find()
:
import json
from bs4 import BeautifulSoup, Comment
txt = '''
<script data_id ="dfsfre2323" data_key="23424sfsfsfdafd" type="application/json"><!--
{"employee": {"name":"sonoo", "salary":56000, "married":true}}
--></script>'''
soup = BeautifulSoup(txt, "html.parser")
data = soup.find("script", {"type":"application/json", 'data_id':"dfsfre2323", 'data_key':"23424sfsfsfdafd"})
comment = BeautifulSoup(data.string, "html.parser").find(text=lambda t: isinstance(t, Comment))
data = json.loads(comment)
print(json.dumps(data, indent=4))
Prints:
{
"employee": {
"name": "sonoo",
"salary": 56000,
"married": true
}
}
Upvotes: 1