tipu
tipu

Reputation: 9604

beautiful soup getting href based on a text

Say there's a page with hundreds of links, each with unique text in the a tag. How can I specify an a tag's text and then get the href from there? For example,

for a in soup.findAll('a', href=True):
  print(a['href'])

This gets all the href throughout the page, which is overkill. When I do this:

for a in soup.findAll('a', href=True text="Some Value"):
  print(a['href'])

I can't grab the href tag because it no longer returns a Tag object, but instead an Navigable object. Any idea how I can achieve what I want?

Upvotes: 7

Views: 3781

Answers (3)

kiwironnie
kiwironnie

Reputation: 1

These worked for me, when looking for 'See all' at the beginning of the text in the tag:

for tag in soup.findAll(lambda tag: (tag.name == 'a' and re.search('^See all',tag.text)), href=True):
    print 'href: ', tag['href']

for a in soup.findAll('a', href=True):
    if re.search('^See all',a.text):
        print 'href: ',(a['href'])      

Upvotes: 0

jcollado
jcollado

Reputation: 40384

Instead of passing the text parameter, you can pass a callable as the name parameter that checks both the tag name and the text:

for tag in soup.findAll(lambda tag: (tag.name == 'a'
                                     and tag.text == 'Some Value'),
                        href=True):
    print tag['href']

This way, the returned value is a Tag instead of a NavigableString.

Note also that, according to the documentation:

If you use text, then any values you give for name and the keyword arguments are ignored.

So probably the second example in your question doesn't work as expected even if you just want to get the NavigableString.

Upvotes: 5

Zsolt Botykai
Zsolt Botykai

Reputation: 51603

You can do at least something like:

for a in soup.findAll('a', href=True):
    if self.tag_to_string(a) == "Some Value":
        print(a['href'])    

But there are other ways.

HTH

Upvotes: 1

Related Questions