Kim Hyesung
Kim Hyesung

Reputation: 797

python how to count the number of the opening and closing tags in html

how to count the number of the opening and closing tags in html

ya.html

<div class="side-article txt-article">
<p>
    <strong>
    </strong> 
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
    </a> 
    <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">
    </a>
</p>
<p>
    <br>
</p>
<p>
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
    </a>
</p>
<p>
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
    </a> 
    <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">
    </a>
</p>
<br>

my code

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('ya.html'), "html.parser")
num_apperances_of_tag = len(soup.find_all())

print num_apperances_of_tag

the output

13

but this is not i want because my code count <p> </p> as one while i want count the opening and closing tag separately.

how to count the number of the opening and closing tags in html ? so the output will be

23 

thanks

Upvotes: 2

Views: 1830

Answers (1)

Martin Gottweis
Martin Gottweis

Reputation: 2739

I suggest you use html parser to solve this:

from HTMLParser import HTMLParser

number_of_starttags = 0
number_of_endtags = 0

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        global number_of_starttags
        number_of_starttags += 1

    def handle_endtag(self, tag):
        global number_of_endtags
        number_of_endtags += 1

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head><body><h1>Parse me!</h1></body></html>')

print(number_of_starttags, number_of_endtags)

Upvotes: 3

Related Questions