Reputation: 2024
I have links in HTML of the form
<a href="/downloadsServlet?docid=abc" target="_blank">Report 1</a>
<a href="/downloadsServlet?docid=ixyz" target="_blank">Fetch Report 2 </a>
I am able to get a list of links of the above form using BeautifulSoup
My code is as follows
from bs4 import BeautifulSoup
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
listOfLinks = list(soup.findall('a'))
However, I want to find the links which have the word "Fetch" in the text referencing the link.
I tried the form
soup.findAll('a', re.compile(".*Fetch.*"))
But that is not working. How do I select only the tags a which have an href and the text portion has the word "Fetch" in it ?
Upvotes: 2
Views: 9165
Reputation: 12168
import re
soup.findAll('a', text = re.compile("Fetch"))
you can use regex as filter, it will use re.search
method to filter our the tag.
text/string
are text value of the tag, text = re.compile("Fetch")
means find the tag which text value contains 'Fetch'
and one more thing, use find_all()
or findAll()
, findall()
is not a key word in bs4
Upvotes: 6
Reputation: 57033
A regex may be an overkill here, but it allows for possible extensions:
def criterion(tag):
return tag.has_attr('href') and re.search('Fetch', tag.text)
soup.findAll(criterion)
# [<a href="/downloadsServlet?docid=ixyz" target="_blank">Fetch Report 2 </a>]
Upvotes: 7