Reputation: 79
I am trying to extract all of the answers from a Quora url following a tutorial. my code looks like this
url = 'https://www.quora.com/Should-I-move-to-London'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
answers = soup.find("script", {"type": "application/ld+json"})
answers
However, when I try to get the text from the answers (bs4.element.tag object), it just appears as empty. How can I extract all of the answers? I also tried the following
data = json.loads(soup.find('script', type='application/ld+json').text)
But I am getting the following error
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I attached a screenshot with the structure of the bs4 body.
Upvotes: 3
Views: 1084
Reputation: 20052
You have to use .string
to get the object.
Here's how:
import json
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('https://www.quora.com/Should-I-move-to-London').content, 'html.parser')
answers = soup.find("script", {"type": "application/ld+json"})
data = json.loads(answers.string)
print(data["mainEntity"]["answerCount"])
For example, this prints:
12
To print the answers use this:
for number, answer in enumerate(data["mainEntity"]["suggestedAnswer"], start=1):
print(f"Answer: {number}. | Upvote count: {answer['upvoteCount']}")
print(answer["text"].strip())
print("-" * 80)
Upvotes: 2
Reputation: 20038
You need to call the .string
method:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.quora.com/Should-I-move-to-London'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
answers = soup.find("script", {"type": "application/ld+json"})
json_data = json.loads(answers.string)
>>> print(type(json_data))
<class 'dict'>
Upvotes: 0