Reputation: 395
Let's say I have a link like this:
link = '<a href="some text">...</a>'
Is there any way I can retrieve the text from anchor href attribute so the result will be something like this:
hrefText = 'some text'
And thank you in advance
Upvotes: 1
Views: 1143
Reputation: 1309
You can use bs4 and requests lib for this.
import requests
from bs4 import BeautifulSoup
url = 'https://examplesite.com/'
source = requests.get(url)
text = source.text
soup = BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {}):
href = '' + link.get('href')
title = link.string
print("hrefText = ", href)
Hope this helps :)
Upvotes: 1
Reputation: 1667
Although you could split or use a regular expression, for a more modular and powerful tool set, you could use
BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/
Sample code:
from bs4 import BeautifulSoup
link = '<a href="some text">...</a>'
soup = BeautifulSoup(link, "html.parser")
for anchor in soup.find_all('a', href=True):
print anchor['href']
Alternatively, for a single function, you can do this:
from bs4 import BeautifulSoup
def getHref( link ):
soup = BeautifulSoup(link, "html.parser")
return soup.find_all('a', href=True)[0]['href']
Upvotes: 1
Reputation: 22438
This is a way:
import re
print re.search('(?<=<a href=")[^"]+',link).group(0)
Or,
print re.search(r'<a\s+href="([^"]+)',link).group(1)
Upvotes: 1