Python BeautifulSoup find_all with regex doesn't match text

Question

I have the following HTML code:



                                Shop

I would like to get the anchor tag that has Shop as text disregarding the spacing before and after. I have tried the following code, but I keep getting an empty array:

import re
html  = """

                                Shop 
"""
soup = BeautifulSoup(html, 'html.parser')
prog = re.compile('\s*Shop\s*')
print(soup.find_all("a", string=prog))
# Output: []

I also tried retrieving the text using get_text():

text = soup.find_all("a")[0].get_text()
print(repr(text))
# Output: '

								Shop 
'

and ran the following code to make sure my Regex was right, which seems to be to the case.

result = prog.match(text)
print(repr(result.group()))
# Output: '

								Shop 
'

I also tried selecting span instead of a but I get the same issue. I'm guessing it's something with find_all, I have read the BeautifulSoup documentation but I still can't find the issue. Any help would be appreciated. Thanks!

Wiktor Stribiżew · Accepted Answer

The problem you have here is that the text you are looking for is in a tag that contains children tags, and when a tag has children tags, the string property is empty.

You can use a lambda expression in the .find call and since you are looking for a fixed string, you may use a mere 'Shop' in t.text condition rather than a regex check:

soup.find(lambda t: t.name == "a" and 'Shop' in t.text)

Python BeautifulSoup find_all with regex doesn't match text

Answers (2)

Related Questions

Python BeautifulSoup find_all with regex doesn&#39;t match text

Answers (2)

Related Questions

Python BeautifulSoup find_all with regex doesn't match text