Reputation: 191
I am try to get the location of text from HTML like below using BeautfulSoup,here are my html:
<p><em>code of Drink<br></em>
Budweiser: 4BDB1CD96<br>
price: 10$</p>
with codes:
soup = BeautifulSoup(html,'lxml')
result = re.escape('4BDB1CD96')
tag = soup.find(['li','div','p','em'],string=re.compile(result))
I cannot extract tag,but where I changed the find_all() into: tag = soup.find(string=re.compile(result)) then I can get the result: Budweiser: 4BDB1CD96 So I want to know why and how to get the result like in tag fromat
Upvotes: 1
Views: 62
Reputation: 626920
The problem here is that your tags have nested tags, and the text you are searching for is inside such a tag (p
here).
So, the easiest approach is to use a lambda inside .find()
to check tag names and if there .text
property contains your pattern. Here, you do not even need a regex:
>>> tag = soup.find(lambda t: t.name in ['li','div','p','em'] and '4BDB1CD96' in t.text)
>>> tag
<p><em>code of Drink<br/></em>
Budweiser: 4BDB1CD96<br/>
price: 10$</p>
>>> tag.string
>>> tag.text
'code of Drink\nBudweiser: 4BDB1CD96\nprice: 10$'
Of course, you may use a regex for more complex searches:
r = re.compile('4BDB1CD96') # or whatever the pattern is
tag = soup.find(lambda t: t.name in ['li','div','p','em'] and r.search(t.text))
Upvotes: 2