Reputation: 1459
I have the following html content :
<a href="http://app_url1" >install app xyz</a>
<a href="http://app_url2" >install app xyz</a>
<a href="http://app_url3" >install app aaa</a>
<a href="http://app_url4">install app aaa</a>
I want to filter the anchor tag texts that end with a given regex pattern (like xyz here)?
I am looking to pass a regex pattern to findAll instead of extra iteration of all anchor
tags.
Upvotes: 1
Views: 797
Reputation: 3085
I think you can try this to get anchor tag texts:
>>> html = """<a href="http://app_url1" >install app xyz</a>
... <a href="http://app_url2" >install app xyz</a>
... <a href="http://app_url3" >install app aaa</a>
... <a href="http://app_url4">install app aaa</a>"""
>>> soup = BeautifulSoup(html, "html.parser")
>>> anchor_texts = []
>>> anchor_texts.append(soup.get_text())
>>> for i in anchor_texts:
... print(i)
Output:
install app xyz
install app xyz
install app aaa
install app aaa
Upvotes: 1
Reputation: 82755
Using lambda with str.endswith
Ex:
from bs4 import BeautifulSoup
html = """<div><a href="http://app_url1" >install app xyz</a>
<a href="http://app_url2" >install app xyz</a>
<a href="http://app_url3" >install app aaa</a>
<a href="http://app_url4">install app aaa</a></div>"""
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all("a", text=lambda x: x is not None and x.endswith("xyz")))
# --> [<a href="http://app_url1">install app xyz</a>, <a href="http://app_url2">install app xyz</a>]
Upvotes: 2
Reputation: 17358
You can use the beautifulSoup text
parameter in find_all
method.
from bs4 import BeautifulSoup
import re
html = """<a href="http://app_url1" >install app xyz</a>
<a href="http://app_url2" >install app xyz</a>
<a href="http://app_url3" >install app aaa</a>
<a href="http://app_url4">install app aaa</a>"""
soup = BeautifulSoup(html, "html.parser")
print(soup.findAll("a", text=re.compile("xyz$")))
Output:
[<a href="http://app_url1">install app xyz</a>, <a href="http://app_url2">install app xyz</a>]
Upvotes: 3