Reputation: 15098
In this code I think I made a mistake or something because I'm not getting the correct json
when I print it, indeed I get nothing but when I index the script I get the json
but using .text
nothing appears I want the json
alone.
CODE :
from bs4 import BeautifulSoup
from urllib.parse import quote_plus
import requests
import selenium.webdriver as webdriver
base_url = 'https://www.instagram.com/{}'
search = input('Enter the instagram account: ')
final_url = base_url.format(quote_plus(search))
response = requests.get(final_url)
print(response.status_code)
if response.ok:
html = response.text
bs_html = BeautifulSoup(html)
scripts = bs_html.select('script[type="application/ld+json"]')
print(scripts[0].text)
Upvotes: 2
Views: 342
Reputation: 116
Change the line print(scripts[0].text)
to print(scripts[0].string)
.
scripts[0]
is a Beautiful Soup Tag
object, and its string contents can be accessed through the .string
property.
Source: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string
If you want to then turn the string into a json so that you can access the data, you can do something like this:
...
if response.ok:
html = response.text
bs_html = BeautifulSoup(html)
scripts = bs_html.select('script[type="application/ld+json"]')
json_output = json.loads(scripts[0].string)
Then, for example, if you run print(json_output['name'])
you should be able to access the name on the account.
Upvotes: 3