Mazzy
Mazzy

Reputation: 14219

Select all div siblings by using BeautifulSoup

I have an html file which has a structure like the following:

<div>
</div

<div>
</div>

<div>
  <div>
  </div>
  <div>
  </div>
  <div>
  </div>
<div>

<div>
  <div>
  </div>
</div>

I would like to select all the siblings div without selecting nested div in the third and fourth block. If I use find_all() I get all the divs.

Upvotes: 4

Views: 8835

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123740

You can find direct children of the parent element:

soup.select('body > div')

to get all div elements under the top-level body tag.

You could also find the first div, then grab all matching siblings with Element.find_next_siblings():

first_div = soup.find('div')
all_divs = [first_div] + first_div.find_next_siblings('div')

Or you could use the element.children generator and filter those:

all_divs = (elem for elem in top_level.children if getattr(elem, 'name', None) == 'div')

where top_level is the element containing these div elements directly.

Upvotes: 8

Related Questions