Dan
Dan

Reputation: 929

Get tag's children count with BeautifulSoup

I'm writing an analyzing tool that counts how many children has any HTML tag in the source code.

I mapped the code with BeautifulSoup, and now I want to iterate over any tag in the page and count how many children it has.

What will be the best way to iterate over all the tags? How can I for example get all the tags that do not have any children?

Upvotes: 4

Views: 9315

Answers (5)

user2138149
user2138149

Reputation: 16615

I am not sure from your question if you wanted to find all children of an element recursively, or only find the direct children.

To iterate over the direct children, use the children attribute.

# in general
for child in element.children:
    # do something

I found it surprising that there is no len function for .children.

(property) children: Iterable[PageElement]

So this does not work:

len(element.children) # does not work

It is somewhat awkward, but you could use the loop to count the number of children.

Upvotes: 0

Akos
Akos

Reputation: 377

Don't reinvent the wheel... especially not in ways that don't roll. BeautifulSoup does count the children for you, unsurprisingly.

from bs4 import BeautifulSoup as BS
doc = BS('<html><head><title>Example</title></head><body><h1>The Truth</h1>'
         + '<p>It is out there, Neo.</p></body></html>')
print(len(doc.html))
# 2, head and body
print(len(list(x for x in doc.html.find_all())))
# 5, because find_all() finds... all?
print(len(list(x for x in doc.html.children)))
# 2, but instead of letting BeautifulSoup count it as it deems best, 
# you actually gather the pieces and count them yourself
print(len(doc.html.contents))
# 2, functionally the same as the prior, just more readable

Upvotes: 0

Mohsin Mahmood
Mohsin Mahmood

Reputation: 3436

You can count the tag's children by using the len() function.

meta_tags = soup.findAll('meta' , property="article:tag")
if len(meta_tags) < 1:
    return False

Upvotes: 2

Cabrera
Cabrera

Reputation: 1960

If you use find_all() with no arguments you can iterate over every tag.

You can get how many children a tag has by using len(tag.contents).

To get a list of all tags with no children:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('someHTMLFile.html', 'r'), 'html.parser')
body = soup.body

empty_tags = []

for tag in body.find_all():
   if len(tag.contents) == 0:
      empty_tags.append(tag)

print empty_tags

or...

empty_tags = [tag for tag in soup.body.find_all() if len(tag.contents) == 0]

Upvotes: 4

Anant Gupta
Anant Gupta

Reputation: 1149

I use BeautifulSoup for the same. Using the findChildren method of each element

In the below code, fullData contains the HTML string of the webpage

soup=BeautifulSoup(fullData)
elements = soup.findAll()

def findElements(dataList,el):
    temp=el.findChildren()
    if(len(temp)==0):
        print(el.get_text())

tempResults=[findElements(dataList,el) for el in elements]

Hope this helps

Upvotes: 0

Related Questions