MP1
MP1

Reputation: 35

Find the parent tag of the most occurring tag - BeautifulSoup 4

While working on a scraper with BeautifulSoup, I ran into a problem where I needed to find the parent tag of the most occuring <p> tag on a page. For Example:

<div class="cls1">
   <p>
   <p>
   <p>
</div>
<div class="cls2">
   <p>
   <P>
</div>

I need to get the the tag which has the most direct children that are <p> elements. In the above example, it would be <div class="cls1"> since there are 3 p tags as opposed to .cls2 which only contain 2.

Any suggestions on how I would approach this or if this is entirely possible?

Upvotes: 0

Views: 101

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

You can use max() built-in function with custom key=:

data = '''<div class="cls1">
   <p>
   <p>
   <p>
</div>
<div class="cls2">
   <p>
   <P>
</div>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html5lib')

print(max(soup.select('div:has(> p)'), key=lambda k: len(k.findChildren('p', recursive=False))))

Prints:

<div class="cls1">
   <p>
   </p><p>
   </p><p>
</p></div>

Upvotes: 4

Related Questions