user3885884
user3885884

Reputation: 395

how to retrieve text from anchor href attribute in python

Let's say I have a link like this:

link = '<a href="some text">...</a>'

Is there any way I can retrieve the text from anchor href attribute so the result will be something like this:

hrefText = 'some text'

And thank you in advance

Upvotes: 1

Views: 1143

Answers (3)

Satyaki Sanyal
Satyaki Sanyal

Reputation: 1309

You can use bs4 and requests lib for this.

import requests
from bs4 import BeautifulSoup
url = 'https://examplesite.com/'
source = requests.get(url)
text = source.text
soup = BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {}):
   href = '' + link.get('href')
   title = link.string
   print("hrefText = ", href)

Hope this helps :)

Upvotes: 1

Brian
Brian

Reputation: 1667

Although you could split or use a regular expression, for a more modular and powerful tool set, you could use

BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/

Sample code:

from bs4 import BeautifulSoup 
link = '<a href="some text">...</a>'
soup = BeautifulSoup(link, "html.parser")
for anchor in soup.find_all('a', href=True):
    print anchor['href']

Alternatively, for a single function, you can do this:

from bs4 import BeautifulSoup 

def getHref( link ):
    soup = BeautifulSoup(link, "html.parser")
    return soup.find_all('a', href=True)[0]['href']

Upvotes: 1

Jahid
Jahid

Reputation: 22438

This is a way:

import re
print re.search('(?<=<a href=")[^"]+',link).group(0)

Or,

print re.search(r'<a\s+href="([^"]+)',link).group(1)

Upvotes: 1

Related Questions