Reputation: 1016
Example HTML:
<p class="labels">
<span>Item1</span>
<span>Item2</span>
<time class="time">
<span>I dont want to get this span</span>
</time>
</p>
I am currently getting all the spans within the tag with the labels
class, but i just want to get the 2 spans directly under the labels
class and i dont want to get any span
tags from child elements.
Currently i am doing it like this obviously:
First i am getting the labels HTML from a much bigger HTML:
labels = html.findAll(_class="labels")
Then i extract the span tags out of this.
spans = labels[0].findAll('span', {"class": None}
In my case the "class": None
doesn't change anything because no span tag has any class.
So my question again is, how can i just get the first 2 span tags without all child elements?
Upvotes: 2
Views: 5782
Reputation: 1016
There is a little sentence in the BeautifulSoup Docs where one can find recursive = False
So the answer on this problem was:
spans = labels[0].findAll('span', {"class": None}, recursive=False)
Upvotes: 4
Reputation: 5292
To extract first two span elements try below
>>>[i.text for i in html.find('p',{"class":"labels"}).findAll('span', {"class": None})[0:2]]
>>>[u'Item1', u'Item2']
If you want to grab all span
inside class labels
then remove the slice-
>>>[i.text for i in html.find('p',{"class":"labels"}).findAll('span', {"class": None})]
>>>[u'Item1', u'Item2', u'I dont want to get this span']
Upvotes: 0
Reputation: 338228
for container in html.findAll(_class="labels"):
spans = container.findAll('span', {"class": None})
spans = [span for span in spans if span.parent is container]
Alternatively iterate the .children
:
for container in html.findAll(_class="labels"):
filter = lambda c: c.name == 'span' and c.class_ == None
spans = [child for child in container.children if filter(child)]
Upvotes: 2