Raymont
Raymont

Reputation: 293

BeautifulSoup with Python to get articleBody

the idea is to extract the content found in articleBody but the code doesn't work for me. What am I missing to bring the note?

from bs4 import BeautifulSoup
import requests

link = 'https://www.clarin.com/sociedad/coronavirus-estudio-dice-acciones-sencillas-podrian-efectivas-cuarentenas_0_ZQM2_GZZn.html'
response = requests.get(link)
soup = BeautifulSoup(response.content, "html.parser")

label = soup.find("application/ld+json", text="articleBody:")
label

Upvotes: 1

Views: 470

Answers (1)

MendelG
MendelG

Reputation: 20048

You can search for the data using type="application/ld+json".

The data your looking at is in JSON format, you can convert it to a python dictionary using the json module:

import json
import requests
from bs4 import BeautifulSoup

link = 'https://www.clarin.com/sociedad/coronavirus-estudio-dice-acciones-sencillas-podrian-efectivas-cuarentenas_0_ZQM2_GZZn.html'

soup = BeautifulSoup(requests.get(link).content, "html.parser")

json_data = json.loads(soup.find(type="application/ld+json").string)

print(type(json_data))
print(json_data['description'])

Output:

<class 'dict'>
Un equipo de investigadores de la Universidad de Viena, .....

Or you can use a CSS Selector to search for all <p> tags under the class body-nota:

soup = BeautifulSoup(requests.get(link).content, "html.parser")

for tag in soup.select(".body-nota > p"):
    print(tag.text)

Upvotes: 1

Related Questions