Reputation: 929
I'm writing an analyzing tool that counts how many children has any HTML tag in the source code.
I mapped the code with BeautifulSoup, and now I want to iterate over any tag in the page and count how many children it has.
What will be the best way to iterate over all the tags? How can I for example get all the tags that do not have any children?
Upvotes: 4
Views: 9315
Reputation: 16615
I am not sure from your question if you wanted to find all children of an element recursively, or only find the direct children.
To iterate over the direct children, use the children
attribute.
# in general
for child in element.children:
# do something
I found it surprising that there is no len
function for .children
.
(property) children: Iterable[PageElement]
So this does not work:
len(element.children) # does not work
It is somewhat awkward, but you could use the loop to count the number of children.
Upvotes: 0
Reputation: 377
Don't reinvent the wheel... especially not in ways that don't roll. BeautifulSoup does count the children for you, unsurprisingly.
from bs4 import BeautifulSoup as BS
doc = BS('<html><head><title>Example</title></head><body><h1>The Truth</h1>'
+ '<p>It is out there, Neo.</p></body></html>')
print(len(doc.html))
# 2, head and body
print(len(list(x for x in doc.html.find_all())))
# 5, because find_all() finds... all?
print(len(list(x for x in doc.html.children)))
# 2, but instead of letting BeautifulSoup count it as it deems best,
# you actually gather the pieces and count them yourself
print(len(doc.html.contents))
# 2, functionally the same as the prior, just more readable
Upvotes: 0
Reputation: 3436
You can count the tag's children by using the len()
function.
meta_tags = soup.findAll('meta' , property="article:tag")
if len(meta_tags) < 1:
return False
Upvotes: 2
Reputation: 1960
If you use find_all()
with no arguments you can iterate over every tag.
You can get how many children a tag has by using len(tag.contents)
.
To get a list of all tags with no children:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('someHTMLFile.html', 'r'), 'html.parser')
body = soup.body
empty_tags = []
for tag in body.find_all():
if len(tag.contents) == 0:
empty_tags.append(tag)
print empty_tags
or...
empty_tags = [tag for tag in soup.body.find_all() if len(tag.contents) == 0]
Upvotes: 4
Reputation: 1149
I use BeautifulSoup for the same. Using the findChildren method of each element
In the below code, fullData contains the HTML string of the webpage
soup=BeautifulSoup(fullData)
elements = soup.findAll()
def findElements(dataList,el):
temp=el.findChildren()
if(len(temp)==0):
print(el.get_text())
tempResults=[findElements(dataList,el) for el in elements]
Hope this helps
Upvotes: 0