Reputation: 53
I am trying to scrape some website, I however have some difficulties to collect what I want:
import requests
from bs4 import BeautifulSoup
import time
from datetime import date, datetime, timedelta
url = 'https://cerbios.swiss/news-events/news/'
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
results_date = soup.find(class_='entry-title')
print(results_date)
Here is the code that I have, and the output of this code is :
<h3 class="entry-title">
<a href="https://cerbios.swiss/new-400-mhz-nmr-in-cerbios/" rel="bookmark" title="NEW 400 MHZ NMR IN
CERBIOS">NEW 400 MHZ NMR IN CERBIOS</a>
</h3>
this is good but what I really want is the "href" in order to have in the output just the URL, I really don't know how to do it, I tried this line : results_url = soup.find(class_='entry-tite')['href'] but it does not work since the class 'entry-title' does not have the "href" thing. if anyone can help me it will be a great pleasure.
Upvotes: 0
Views: 63
Reputation: 1415
You're trying to access an href
attribute on the <h3>
element which does not exist. You can either keep using find()
to get to the <a>
element or use a more specific selector.
soup.find(class_='entry-title').find('a')['href']
or
soup.select_one('h3.entry-title a')['href']
Upvotes: 2