Reputation: 422
I have these 2 scenarios where I want to search a tag by its text using a regular expression.
soup = BeautifulSoup("<B><A NAME="toc96446_13"></A>TEXT </B></P>", "html5lib")
soup.find('b', text=re.compile('TEXT'))
I assume this doesn't work because of the tag inside which actually contains my TEXT.
Also how can I find a tag containing only digits?
soup = BeautifulSoup("<p>169</p>", "html5lib")
soup.find('p', text=re.compile(r'[0-9]{1,}'))
Thanks
Upvotes: 0
Views: 54
Reputation: 195408
Fir searching elements you can use lambda
and tag.text
:
from bs4 import BeautifulSoup
import re
data = """
<B><A NAME="toc96446_13"></A>TEXT</B></P>
"""
soup = BeautifulSoup(data, 'html5lib')
print(soup.find(lambda t: t.name=='b' and re.search(r'TEXT', t.text)))
Prints:
<b><a name="toc96446_13"></a>TEXT</b>
For only digits, you can leverage regexp
^
and $
constants (note, this will match only first <p>
tag with 169
inside, not second with ab1234
inside):
soup = BeautifulSoup("<p>169</p><p>ab1234</p>", 'html5lib')
print(soup.find('p', text=re.compile(r'^\d+$')))
Prints:
<p>169</p>
Upvotes: 2