Delrius Euphoria
Delrius Euphoria

Reputation: 15098

Not getting json when using .text in bs4

In this code I think I made a mistake or something because I'm not getting the correct json when I print it, indeed I get nothing but when I index the script I get the json but using .text nothing appears I want the json alone.

CODE :

from bs4 import BeautifulSoup
from urllib.parse import quote_plus
import requests
import selenium.webdriver as webdriver

base_url = 'https://www.instagram.com/{}'

search = input('Enter the instagram account: ')

final_url = base_url.format(quote_plus(search))

response = requests.get(final_url)

print(response.status_code)
if response.ok:
    html = response.text
    bs_html = BeautifulSoup(html)
    scripts = bs_html.select('script[type="application/ld+json"]')
    print(scripts[0].text)

Upvotes: 2

Views: 342

Answers (1)

schoinh
schoinh

Reputation: 116

Change the line print(scripts[0].text) to print(scripts[0].string).

scripts[0] is a Beautiful Soup Tag object, and its string contents can be accessed through the .string property.

Source: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string

If you want to then turn the string into a json so that you can access the data, you can do something like this:

...

if response.ok:
    html = response.text
    bs_html = BeautifulSoup(html)
    scripts = bs_html.select('script[type="application/ld+json"]')
    json_output = json.loads(scripts[0].string)

Then, for example, if you run print(json_output['name']) you should be able to access the name on the account.

Upvotes: 3

Related Questions