Bioaim
Bioaim

Reputation: 1016

Python BeautifulSoup - get elements without child elements

Example HTML:

<p class="labels">
  <span>Item1</span>
  <span>Item2</span>
  <time class="time">
    <span>I dont want to get this span</span>
  </time>
</p>

I am currently getting all the spans within the tag with the labels class, but i just want to get the 2 spans directly under the labels class and i dont want to get any span tags from child elements.

Currently i am doing it like this obviously:

First i am getting the labels HTML from a much bigger HTML:

labels = html.findAll(_class="labels")

Then i extract the span tags out of this.

spans = labels[0].findAll('span', {"class": None}

In my case the "class": None doesn't change anything because no span tag has any class.

So my question again is, how can i just get the first 2 span tags without all child elements?

Upvotes: 2

Views: 5782

Answers (3)

Bioaim
Bioaim

Reputation: 1016

There is a little sentence in the BeautifulSoup Docs where one can find recursive = False

So the answer on this problem was:

spans = labels[0].findAll('span', {"class": None}, recursive=False)

Upvotes: 4

Learner
Learner

Reputation: 5292

To extract first two span elements try below

>>>[i.text for i in html.find('p',{"class":"labels"}).findAll('span', {"class": None})[0:2]]
>>>[u'Item1', u'Item2']

If you want to grab all span inside class labels then remove the slice-

>>>[i.text for i in html.find('p',{"class":"labels"}).findAll('span', {"class": None})]
>>>[u'Item1', u'Item2', u'I dont want to get this span']

Upvotes: 0

Tomalak
Tomalak

Reputation: 338228

for container in html.findAll(_class="labels"):
    spans = container.findAll('span', {"class": None})
    spans = [span for span in spans if span.parent is container]

Alternatively iterate the .children:

for container in html.findAll(_class="labels"):
    filter = lambda c: c.name == 'span' and c.class_ == None
    spans = [child for child in container.children if filter(child)]

Upvotes: 2

Related Questions