Ankit Singh
Ankit Singh

Reputation: 23

How to find the direct children (not the children of children) of a div in html using BeautifulSoup?

Markup :

<div class = "parent-div">
    <div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>
    <div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>
</div>

I want to get a list of the direct children of the div[parent-div]

i.e. a list as:

[div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>,<div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>]

I am using below BeautifulSoup code:

page_soup = soup(page_html,"html.parser")
main_cont = page_soup.find('div',{'class':'parent-div'}).findAll('div')

This code gives me the list of all divs:

[<div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>,<div class = "child-1.1">
        </div>,<div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>,<div class = "child-2.1">
        </div>]

How do i get a list of just the immediate children of the parent div?

Upvotes: 1

Views: 1906

Answers (2)

facelessuser
facelessuser

Reputation: 1734

You can use CSS selectors to do this quite easily. NOTE: using Beautiful Soup 4.7+. Specifically, using the child combinator: https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator.

from bs4 import BeautifulSoup

html = """
<div class = "parent-div">
    <div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>
    <div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

print(soup.select('div.parent-div > *'))

Output

[<div class="child-1">\n<div class="child-1.1">\n</div>\n</div>, <div class="child-2">\n<div class="child-2.1">\n</div>\n</div>]

Upvotes: 0

Omer Tekbiyik
Omer Tekbiyik

Reputation: 4744

You can use findChildren() method for getting children tags .

main_cont = soup.find('div',{'class':'parent-div'}).findChildren('div',recursive=False)

Output :

[<div class="child-1"><div class="child-1.1"></div></div>, <div class="child-2"><div class="child-2.1"> </div></div>]

Upvotes: 2

Related Questions