vkt
vkt

Reputation: 1459

beautifulsoup - filter text of anchor tag

I have the following html content :

<a href="http://app_url1" >install app xyz</a>
<a href="http://app_url2" >install app xyz</a>
<a href="http://app_url3" >install app aaa</a>
<a href="http://app_url4">install app aaa</a>

I want to filter the anchor tag texts that end with a given regex pattern (like xyz here)? I am looking to pass a regex pattern to findAll instead of extra iteration of all anchor tags.

Upvotes: 1

Views: 797

Answers (3)

Ashok Kumar Jayaraman
Ashok Kumar Jayaraman

Reputation: 3085

I think you can try this to get anchor tag texts:

>>> html = """<a href="http://app_url1" >install app xyz</a>
... <a href="http://app_url2" >install app xyz</a>
... <a href="http://app_url3" >install app aaa</a>
... <a href="http://app_url4">install app aaa</a>"""
>>> soup = BeautifulSoup(html, "html.parser")
>>> anchor_texts = []
>>> anchor_texts.append(soup.get_text())
>>> for i in anchor_texts:
...    print(i)

Output:

install app xyz
install app xyz
install app aaa
install app aaa

Upvotes: 1

Rakesh
Rakesh

Reputation: 82755

Using lambda with str.endswith

Ex:

from bs4 import BeautifulSoup

html = """<div><a href="http://app_url1" >install app xyz</a>
<a href="http://app_url2" >install app xyz</a>
<a href="http://app_url3" >install app aaa</a>
<a href="http://app_url4">install app aaa</a></div>"""

soup = BeautifulSoup(html, "html.parser")
print(soup.find_all("a", text=lambda x: x is not None and x.endswith("xyz")))
# --> [<a href="http://app_url1">install app xyz</a>, <a href="http://app_url2">install app xyz</a>]

Upvotes: 2

bigbounty
bigbounty

Reputation: 17358

You can use the beautifulSoup text parameter in find_all method.

from bs4 import BeautifulSoup
import re

html = """<a href="http://app_url1" >install app xyz</a>
<a href="http://app_url2" >install app xyz</a>
<a href="http://app_url3" >install app aaa</a>
<a href="http://app_url4">install app aaa</a>"""

soup = BeautifulSoup(html, "html.parser")

print(soup.findAll("a", text=re.compile("xyz$")))

Output:

[<a href="http://app_url1">install app xyz</a>, <a href="http://app_url2">install app xyz</a>]

Upvotes: 3

Related Questions