Reputation: 259
<div class="info">
<h3> Height:
<span>1.1</span>
</h3>
</div>
<div class="info">
<h3> Number:
<span>111111111</span>
</h3>
</div>
This is a partial portion of the site. Ultimately, I want to extract the 111111111. I know I can do
soup.find_all("div", { "class" : "info" })
to get a list of both divs; however, I would prefer to not have to perform a loop to check if it contains the text "Number".
Is there a more elegant way to extract "1111111" so that it does soup.find_all("div", { "class" : "info" })
, but also makes it so that it MUST contain "Number" within?
I also tried numberSoup = soup.find('h3', text='Number')
but it returns None
Upvotes: 4
Views: 10347
Reputation: 216
You can write your own filter function and let it be the argument of function find_all
.
from bs4 import BeautifulSoup
def number_span(tag):
return tag.name=='span' and 'Number:' in tag.parent.contents[0]
soup = BeautifulSoup(html, 'html.parser')
tags = soup.find_all(number_span)
By the way, the reason you can't fetch tags with the text
param is: text param helps us find tags whose .string
value equal to its value. And if a tag contains more than one thing then it is not clear what .string
should refer to. So .string
is defined to be None
.
You can reference to beautiful soup doc.
Upvotes: 7