Count html tags with Perl regex

Question

I'm trying to parse an HTML file to count HTML tags. I'm not much familiar with Regexp though.

My current code counts only by line. not tag by tag. It returns the whole line.

while(_{){
    while(/(<[^\/][a-z].*>)/gi){
        print $_;
        $count++;
    }
}}

suppose that we have a line like this in the file

blahblahblah hello

blah

I need to extract the opening tag of every HTML tag and also tags like

,

and .

Could you please put me in the right direction.

coder hacker · Accepted Answer

If you want to count HTML tags within a document I suggest that you use HTML::Treebuilder.

use strict;
use HTML::Tree;
use LWP::Simple;

my $ex = "http://www.google.com";

my $content = get($ex);

my $tree = HTML::Tree->new();

$tree->parse($content);

my @a_tags = $tree->look_down( '_tag' , 'div' );


my $size=@a_tags;
print $size;

Now you can specify different tag names instead of div and count all different tags that you require. I suggest studying HTML::Treebuilder as it is a very useful module and you may finds methods you may find useful.

Count html tags with Perl regex

Answers (1)

Related Questions