Lee Jack
Lee Jack

Reputation: 191

BeautifulSoup cannot extract item using find_all()

I am try to get the location of text from HTML like below using BeautfulSoup,here are my html:

<p><em>code of Drink<br></em>
Budweiser: 4BDB1CD96<br>
price: 10$</p>

with codes:

soup = BeautifulSoup(html,'lxml')
result = re.escape('4BDB1CD96')
tag = soup.find(['li','div','p','em'],string=re.compile(result))

I cannot extract tag,but where I changed the find_all() into: tag = soup.find(string=re.compile(result)) then I can get the result: Budweiser: 4BDB1CD96 So I want to know why and how to get the result like in tag fromat

Upvotes: 1

Views: 62

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626920

The problem here is that your tags have nested tags, and the text you are searching for is inside such a tag (p here).

So, the easiest approach is to use a lambda inside .find() to check tag names and if there .text property contains your pattern. Here, you do not even need a regex:

>>> tag = soup.find(lambda t: t.name in ['li','div','p','em'] and '4BDB1CD96' in t.text)
>>> tag
<p><em>code of Drink<br/></em>
Budweiser: 4BDB1CD96<br/>
price: 10$</p>
>>> tag.string
>>> tag.text
'code of Drink\nBudweiser: 4BDB1CD96\nprice: 10$'

Of course, you may use a regex for more complex searches:

r = re.compile('4BDB1CD96') # or whatever the pattern is
tag = soup.find(lambda t: t.name in ['li','div','p','em'] and r.search(t.text))

Upvotes: 2

Related Questions