BOBTHEBUILDER
BOBTHEBUILDER

Reputation: 393

beautifulSoup find_all() with series of tags

I am looking to search a website for specific tags within tags with bs find_all(). for example, searching for data within:

<li class='x'>
    <small class='y'>

I am currently using this code to search but am coming up with extra results from elsewhere on the html page because I haven't specified that I only want to search within li tags with class x.

labels = [element.text for element in soup.find_all('small', {'class':'label'})]

how do I specify specifically where I want to search?

Upvotes: 1

Views: 1022

Answers (2)

HedgeHog
HedgeHog

Reputation: 25048

You can specify like this:

optionA = [element.text for element in soup.find('ul').find_all('small', {'class':'label'})]

Will first find the parent <ul> and than all <small>

optionB = [element.text for element in soup.select('ul small.label')]

Alternativ, use the css selectors, in my opinion much better for chaining tags and classes.

Example

from bs4 import BeautifulSoup

html = '''<ul>
  <li><small class="label">Coffee</small></li>
  <li><small class="label">Tea</small></li>
  <li><small class="label">Milk</small></li>
</ul>'''

soup = BeautifulSoup(html,)

optionA = [element.text for element in soup.find('ul').find_all('small', {'class':'label'})]
optionB = [element.text for element in soup.select('ul small.label')]

print(optionA)
print(optionB)

Output

['Coffee', 'Tea', 'Milk']
['Coffee', 'Tea', 'Milk']

Upvotes: 1

You can use .select() to apply css-selectors that allow more advanced searching:

 soup.select('.label>small')

This will find a element with tag small with the immediate parent having a class label. See https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors for examples of more CSS selectors, though note Beautiful Soup may not support some of the newer syntax.

Upvotes: 2

Related Questions