Kishan Mehta
Kishan Mehta

Reputation: 2678

Python - Beautifulsoup count only outer tag children of a tag

HTML of page:

<form name="compareprd" action="">
    <div class="gridBox product " id="quickLookItem-1">
        <div class="gridItemTop">
        </div>
    </div>
    <div class="gridBox product " id="quickLookItem-2">
        <div class="gridItemTop">
        </div>
    </div>
    <!-- many more like this. -->

I am using Beautiful soup to scrap a page. In that page I am able to get a form tag by its name.

tag = soup.find("form", {"name": "compareprd"})

Now I want to count all immediate child divs but not all nested divs. Say for example there are 20 immediate divs inside form. I tried :

len(tag.findChildren("div"))

But It gives 1500.

I think it gives all "div" inside "form" tag.

Any help appreciated.

Upvotes: 1

Views: 1697

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180401

You can use a single css selector form[name=compareprd] > div which will find div's that are immediate children of the form:

html  = """<form name="compareprd" action="">
<div class="gridBox product " id="quickLookItem-1">
    <div class="gridItemTop">
    </div>
</div>

<div class="gridBox product " id="quickLookItem-2">
    <div class="gridItemTop">
    </div>
</div>
</form>"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)


print(len(soup.select("form[name=compareprd] > div")))

Or as commented pass recursive=True but use find_all, findChildren goes back to the bs2 days and is only provided for backwards compatability.

  len(tag.find_all("div", recursive=False)

Upvotes: 2

Related Questions