Reputation: 293
the idea is to extract the content found in articleBody but the code doesn't work for me. What am I missing to bring the note?
from bs4 import BeautifulSoup
import requests
link = 'https://www.clarin.com/sociedad/coronavirus-estudio-dice-acciones-sencillas-podrian-efectivas-cuarentenas_0_ZQM2_GZZn.html'
response = requests.get(link)
soup = BeautifulSoup(response.content, "html.parser")
label = soup.find("application/ld+json", text="articleBody:")
label
Upvotes: 1
Views: 470
Reputation: 20048
You can search for the data using type="application/ld+json"
.
The data your looking at is in JSON format, you can convert it to a python dictionary using the json
module:
import json
import requests
from bs4 import BeautifulSoup
link = 'https://www.clarin.com/sociedad/coronavirus-estudio-dice-acciones-sencillas-podrian-efectivas-cuarentenas_0_ZQM2_GZZn.html'
soup = BeautifulSoup(requests.get(link).content, "html.parser")
json_data = json.loads(soup.find(type="application/ld+json").string)
print(type(json_data))
print(json_data['description'])
Output:
<class 'dict'>
Un equipo de investigadores de la Universidad de Viena, .....
Or you can use a CSS Selector to search for all <p>
tags under the class body-nota
:
soup = BeautifulSoup(requests.get(link).content, "html.parser")
for tag in soup.select(".body-nota > p"):
print(tag.text)
Upvotes: 1