sr1
sr1

Reputation: 251

Perl : Extract an HTML element with a particular class using HTML::TokeParser

I'm trying to extract the HTML content present in < td > tags corresponding to the class "tablehead1".

< td class="tablehead1"> Market < /td >

While parsing, i'm getting all the text contents of < td > tags present in the whole html file. But I need only the content in < td > tags with the particular class "tablehead1" .

Where am i going wrong in the below code ?

use HTML::TokeParser;

open(DATA,"<KeyStats.html") or die "Can't open data";
my $p = HTML::TokeParser->new(*DATA);

while (my $token = $p->get_tag('td')) {

 my $url = $token->[1]{class} || "tablehead1";
 my $text = $p->get_trimmed_text("/td");

 if (length($text)<30&&length($text)>0) {  print "$text\n"; }
}

Upvotes: 1

Views: 245

Answers (1)

Nemesis
Nemesis

Reputation: 2334

You don't really perform the check whether the class is really tablehead1.

Replace

my $url = $token->[1]{class} || "tablehead1";

by

next unless $token->[1]{class} eq "tablehead1";

should give you the expected results. Also, you should add a check whether the actual <td> really has a key class, e.g. by

next unless grep( /^class$/, @{$token->[2]} ) && $token->[1]{class} eq "tablehead1";

Upvotes: 1

Related Questions