Reputation: 797
how to count the number of the opening and closing tags in html
ya.html
<div class="side-article txt-article">
<p>
<strong>
</strong>
<a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
</a>
<a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">
</a>
</p>
<p>
<br>
</p>
<p>
<a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
</a>
</p>
<p>
<a href="http://batam.tribunnews.com/tag/polres/" title="Polres">
</a>
<a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">
</a>
</p>
<br>
my code
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('ya.html'), "html.parser")
num_apperances_of_tag = len(soup.find_all())
print num_apperances_of_tag
the output
13
but this is not i want because my code count <p> </p>
as one while i want count the opening and closing tag separately.
how to count the number of the opening and closing tags in html ? so the output will be
23
thanks
Upvotes: 2
Views: 1830
Reputation: 2739
I suggest you use html parser to solve this:
from HTMLParser import HTMLParser
number_of_starttags = 0
number_of_endtags = 0
# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
global number_of_starttags
number_of_starttags += 1
def handle_endtag(self, tag):
global number_of_endtags
number_of_endtags += 1
# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head><body><h1>Parse me!</h1></body></html>')
print(number_of_starttags, number_of_endtags)
Upvotes: 3