Reputation: 135
I have the following HTML code:
<a class="nav-link" href="https://cbd420.ch/fr/tous-les-produits/">
<span class="cbp-tab-title">
Shop <i class="fa fa-angle-down cbp-submenu-aindicator"></i></span>
</a>
I would like to get the anchor tag that has Shop
as text disregarding the spacing before and after. I have tried the following code, but I keep getting an empty array:
import re
html = """<a class="nav-link" href="https://cbd420.ch/fr/tous-les-produits/">
<span class="cbp-tab-title">
Shop <i class="fa fa-angle-down cbp-submenu-aindicator"></i></span>
</a>"""
soup = BeautifulSoup(html, 'html.parser')
prog = re.compile('\s*Shop\s*')
print(soup.find_all("a", string=prog))
# Output: []
I also tried retrieving the text using get_text()
:
text = soup.find_all("a")[0].get_text()
print(repr(text))
# Output: '\n\n\t\t\t\t\t\t\t\tShop \n'
and ran the following code to make sure my Regex was right, which seems to be to the case.
result = prog.match(text)
print(repr(result.group()))
# Output: '\n\n\t\t\t\t\t\t\t\tShop \n'
I also tried selecting span
instead of a
but I get the same issue. I'm guessing it's something with find_all
, I have read the BeautifulSoup documentation but I still can't find the issue. Any help would be appreciated. Thanks!
Upvotes: 0
Views: 627
Reputation: 33384
The text Shop
you are searching it is inside span
tag so when you are trying with regular expression its unable to fetch the value using regex.
You can try regex to find text and then parent of that.
import re
html = """<a class="nav-link" href="https://cbd420.ch/fr/tous-les-produits/">
<span class="cbp-tab-title">
Shop <i class="fa fa-angle-down cbp-submenu-aindicator"></i></span>
</a>"""
soup = BeautifulSoup(html, 'html.parser')
print(soup.find(text=re.compile('Shop')).parent.parent)
If you have BS 4.7.1 or above you can use following css selector.
html = """<a class="nav-link" href="https://cbd420.ch/fr/tous-les-produits/">
<span class="cbp-tab-title">
Shop <i class="fa fa-angle-down cbp-submenu-aindicator"></i></span>
</a>"""
soup = BeautifulSoup(html, 'html.parser')
print(soup.select_one('a:contains("Shop")'))
Upvotes: 0
Reputation: 626920
The problem you have here is that the text you are looking for is in a tag that contains children tags, and when a tag has children tags, the string
property is empty.
You can use a lambda expression in the .find
call and since you are looking for a fixed string, you may use a mere 'Shop' in t.text
condition rather than a regex check:
soup.find(lambda t: t.name == "a" and 'Shop' in t.text)
Upvotes: 1