Reputation: 4450
I'm trying to extract a link from a page with python and the beautifulsoup library, but I'm stuck. The link is on the following page, on the sidebar area, directly underneath the h4 subtitle "Original Source:
http://www.eurekalert.org/pub_releases/2016-06/uonc-euc062016.php
I've managed to isolate the link (mostly), but I'm unsure of how to further advance my targeting to actually extract the link. Here's my code so far:
import requests
from bs4 import BeautifulSoup
url = "http://www.eurekalert.org/pub_releases/2016-06/uonc-euc062016.php"
data = requests.get(url)
soup = BeautifulSoup(data.text, 'lxml')
source_url = soup.find('section', class_='widget hidden-print').find('div', class_='widget-content').findAll('a')[-1]
print(source_url)
I am currently getting the full html of the last element in which I've isolated, where I'm trying to simply get the link. Of note, this is the only link on the page I'm trying to get.
Upvotes: 1
Views: 1643
Reputation: 5157
You almost got it!!
SOLUTION 1:
You just have to run the .text
method on the soup
you've assigned to source_url
.
So instead of:
print(source_url)
You should use:
print(source_url.text)
Output:
SOLUTION 2:
You should call source_url.get('href')
to get only the specific href
tag related to your soup.findall
element.
print source_url.get('href')
Output:
Upvotes: 0
Reputation: 2310
You're looking for the link which is the href
html attribute. source_url is a bs4.element.Tag
which has the get
method like:
source_url.get('href')
Upvotes: 1