BeautifulSoup cannot extract item using find_all()

Question

I am try to get the location of text from HTML like below using BeautfulSoup,here are my html:

code of Drink

Budweiser: 4BDB1CD96

price: 10$

with codes:

soup = BeautifulSoup(html,'lxml')
result = re.escape('4BDB1CD96')
tag = soup.find(['li','div','p','em'],string=re.compile(result))

I cannot extract tag,but where I changed the find_all() into: tag = soup.find(string=re.compile(result)) then I can get the result: Budweiser: 4BDB1CD96 So I want to know why and how to get the result like in tag fromat

Wiktor Stribiżew · Accepted Answer

The problem here is that your tags have nested tags, and the text you are searching for is inside such a tag (p here).

So, the easiest approach is to use a lambda inside .find() to check tag names and if there .text property contains your pattern. Here, you do not even need a regex:

>>> tag = soup.find(lambda t: t.name in ['li','div','p','em'] and '4BDB1CD96' in t.text)
>>> tag
code of Drink

Budweiser: 4BDB1CD96

price: 10$
>>> tag.string
>>> tag.text
'code of Drink
Budweiser: 4BDB1CD96
price: 10$'

Of course, you may use a regex for more complex searches:

r = re.compile('4BDB1CD96') # or whatever the pattern is
tag = soup.find(lambda t: t.name in ['li','div','p','em'] and r.search(t.text))

BeautifulSoup cannot extract item using find_all()

Answers (1)

Related Questions