Reputation: 35
While working on a scraper with BeautifulSoup, I ran into a problem where I needed to find the parent tag of the most occuring <p>
tag on a page. For Example:
<div class="cls1">
<p>
<p>
<p>
</div>
<div class="cls2">
<p>
<P>
</div>
I need to get the the tag which has the most direct children that are <p>
elements. In the above example, it would be <div class="cls1">
since there are 3 p
tags as opposed to .cls2
which only contain 2.
Any suggestions on how I would approach this or if this is entirely possible?
Upvotes: 0
Views: 101
Reputation: 195438
You can use max()
built-in function with custom key=
:
data = '''<div class="cls1">
<p>
<p>
<p>
</div>
<div class="cls2">
<p>
<P>
</div>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html5lib')
print(max(soup.select('div:has(> p)'), key=lambda k: len(k.findChildren('p', recursive=False))))
Prints:
<div class="cls1">
<p>
</p><p>
</p><p>
</p></div>
Upvotes: 4