Jusrock
Jusrock

Reputation: 21

How do I check the total number of closed tags using BeautifulSoup?

The code below checks whether there more than one open html tag ,

from bs4 import BeautifulSoup


invalid = """<html>
<html>

</html>
</html>"""

soup = BeautifulSoup(invalid, 'html.parser')
print len(soup.find_all("html"))  # prints 2

valid = """<html>
</html></html>"""

soup = BeautifulSoup(valid, 'html.parser')
print len(soup.find_all("html"))  # prints 1

But How to check whether there is more than one closed html tag?

Upvotes: 1

Views: 382

Answers (1)

Steve Jessop
Steve Jessop

Reputation: 279255

I wouldn't use BeautifulSoup, because it's specifically a tag soup parser. It cleans up mis-matched open and close tags for you, that's part of the point.

Instead, use the parser that BeautifulSoup uses. There's a standard one in Python, called HTMLParser in Python2 and html.parser in Python3. If you've read the BeautifulSoup documentation you know that others are available, such as lxml or html5lib.

So for example:

import html.parser

class Parser(html.parser.HTMLParser):
    count = 0
    def handle_endtag(self, tag):
        if tag == 'html':
            self.count += 1

parser = Parser()
parser.feed('<html></html><!-- </html> --></html>')
parser.close()
print(parser.count)

Output:

2

Upvotes: 1

Related Questions