linacarrillo
linacarrillo

Reputation: 79

Extracting tag from bs4.element.tag returns empty string

I am trying to extract all of the answers from a Quora url following a tutorial. my code looks like this

url = 'https://www.quora.com/Should-I-move-to-London'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
answers = soup.find("script", {"type": "application/ld+json"})
answers

However, when I try to get the text from the answers (bs4.element.tag object), it just appears as empty. How can I extract all of the answers? I also tried the following

data = json.loads(soup.find('script', type='application/ld+json').text)

But I am getting the following error

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I attached a screenshot with the structure of the bs4 body.enter image description here

Upvotes: 3

Views: 1084

Answers (2)

baduker
baduker

Reputation: 20052

You have to use .string to get the object.

Here's how:

import json

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get('https://www.quora.com/Should-I-move-to-London').content, 'html.parser')
answers = soup.find("script", {"type": "application/ld+json"})
data = json.loads(answers.string)
print(data["mainEntity"]["answerCount"])

For example, this prints:

12

To print the answers use this:

for number, answer in enumerate(data["mainEntity"]["suggestedAnswer"], start=1):
    print(f"Answer: {number}. | Upvote count: {answer['upvoteCount']}")
    print(answer["text"].strip())
    print("-" * 80)

Upvotes: 2

MendelG
MendelG

Reputation: 20038

You need to call the .string method:

import json
import requests
from bs4 import BeautifulSoup

url = 'https://www.quora.com/Should-I-move-to-London'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
answers = soup.find("script", {"type": "application/ld+json"})

json_data = json.loads(answers.string)
>>> print(type(json_data))
<class 'dict'>

Upvotes: 0

Related Questions