Reputation: 9604
Say there's a page with hundreds of links, each with unique text in the a tag. How can I specify an a tag's text and then get the href from there? For example,
for a in soup.findAll('a', href=True):
print(a['href'])
This gets all the href throughout the page, which is overkill. When I do this:
for a in soup.findAll('a', href=True text="Some Value"):
print(a['href'])
I can't grab the href tag because it no longer returns a Tag object, but instead an Navigable object. Any idea how I can achieve what I want?
Upvotes: 7
Views: 3781
Reputation: 1
These worked for me, when looking for 'See all' at the beginning of the text in the tag:
for tag in soup.findAll(lambda tag: (tag.name == 'a' and re.search('^See all',tag.text)), href=True):
print 'href: ', tag['href']
for a in soup.findAll('a', href=True):
if re.search('^See all',a.text):
print 'href: ',(a['href'])
Upvotes: 0
Reputation: 40384
Instead of passing the text
parameter, you can pass a callable as the name
parameter that checks both the tag name
and the text
:
for tag in soup.findAll(lambda tag: (tag.name == 'a'
and tag.text == 'Some Value'),
href=True):
print tag['href']
This way, the returned value is a Tag
instead of a NavigableString
.
Note also that, according to the documentation:
If you use text, then any values you give for name and the keyword arguments are ignored.
So probably the second example in your question doesn't work as expected even if you just want to get the NavigableString
.
Upvotes: 5
Reputation: 51603
You can do at least something like:
for a in soup.findAll('a', href=True):
if self.tag_to_string(a) == "Some Value":
print(a['href'])
But there are other ways.
HTH
Upvotes: 1