Reputation: 4177
I want to get all the <a>
tags which are children of <li>
:
<div>
<li class="test">
<a>link1</a>
<ul>
<li>
<a>link2</a>
</li>
</ul>
</li>
</div>
I know how to find element with particular class like this:
soup.find("li", { "class" : "test" })
But I don't know how to find all <a>
which are children of <li class=test>
but not any others.
Like I want to select:
<a>link1</a>
Upvotes: 188
Views: 361338
Reputation: 8035
Try this
li = soup.find('li', {'class': 'test'})
children = li.findChildren("a" , recursive=False)
for child in children:
print(child)
Upvotes: 221
Reputation: 310
Just came across this answer and checked the documentation to see that soup.findChildren
is deprecated (BS 4.9). You can use soup.children
instead, which only considers an element's direct children, not its descendants.
li = soup.find('li', {'class': 'text'})
for child in li.children:
print(child)
Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children
Upvotes: 8
Reputation: 5954
There's a super small section in the DOCs that shows how to find/find_all direct children.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument
In your case as you want link1 which is first direct child:
# for only first direct child
soup.find("li", { "class" : "test" }).find("a", recursive=False)
If you want all direct children:
# for all direct children
soup.find("li", { "class" : "test" }).findAll("a", recursive=False)
Upvotes: 151
Reputation: 3118
"How to find all a
which are children of <li class=test>
but not any others?"
Given the HTML below (I added another <a>
to show te difference between select
and select_one
):
<div>
<li class="test">
<a>link1</a>
<ul>
<li>
<a>link2</a>
</li>
</ul>
<a>link3</a>
</li>
</div>
The solution is to use child combinator (>
) that is placed between two CSS selectors:
>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]
In case you want to find only the first child:
>>> soup.select_one('li.test > a')
<a>link1</a>
Upvotes: 18
Reputation: 208
try this:
li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li
other reminders:
The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.
Upvotes: 16
Reputation: 814
Yet another method - create a filter function that returns True
for all desired tags:
def my_filter(tag):
return (tag.name == 'a' and
tag.parent.name == 'li' and
'test' in tag.parent['class'])
Then just call find_all
with the argument:
for a in soup(my_filter): # or soup.find_all(my_filter)
print a
Upvotes: 8
Reputation: 18217
Perhaps you want to do
soup.find("li", { "class" : "test" }).find('a')
Upvotes: 24