Paul R.
Paul R.

Reputation: 422

BeautifulSoup4 search tag by text regex

I have these 2 scenarios where I want to search a tag by its text using a regular expression.

soup = BeautifulSoup("<B><A NAME="toc96446_13"></A>TEXT </B></P>", "html5lib")
soup.find('b', text=re.compile('TEXT'))

I assume this doesn't work because of the tag inside which actually contains my TEXT.

Also how can I find a tag containing only digits?

soup = BeautifulSoup("<p>169</p>", "html5lib")
soup.find('p', text=re.compile(r'[0-9]{1,}'))

Thanks

Upvotes: 0

Views: 54

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

Fir searching elements you can use lambda and tag.text:

from bs4 import BeautifulSoup
import re

data = """
<B><A NAME="toc96446_13"></A>TEXT</B></P>
"""
soup = BeautifulSoup(data, 'html5lib')
print(soup.find(lambda t: t.name=='b' and re.search(r'TEXT', t.text)))

Prints:

<b><a name="toc96446_13"></a>TEXT</b>

For only digits, you can leverage regexp ^ and $ constants (note, this will match only first <p> tag with 169 inside, not second with ab1234 inside):

soup = BeautifulSoup("<p>169</p><p>ab1234</p>", 'html5lib')
print(soup.find('p', text=re.compile(r'^\d+$')))

Prints:

<p>169</p>

Upvotes: 2

Related Questions