Perl : Extract an HTML element with a particular class using HTML::TokeParser

Question

I'm trying to extract the HTML content present in < td > tags corresponding to the class "tablehead1".

< td class="tablehead1"> Market < /td >

While parsing, i'm getting all the text contents of < td > tags present in the whole html file. But I need only the content in < td > tags with the particular class "tablehead1" .

Where am i going wrong in the below code ?

use HTML::TokeParser;

open(DATA,"new(*DATA);

while (my $token = $p->get_tag('td')) {

 my $url = $token->[1]{class} || "tablehead1";
 my $text = $p->get_trimmed_text("/td");

 if (length($text)<30&&length($text)>0) {  print "$text
"; }
}

Nemesis · Accepted Answer

You don't really perform the check whether the class is really tablehead1.

Replace

my $url = $token->[1]{class} || "tablehead1";

by

next unless $token->[1]{class} eq "tablehead1";

should give you the expected results. Also, you should add a check whether the actual really has a key class, e.g. by

next unless grep( /^class$/, @{$token->[2]} ) && $token->[1]{class} eq "tablehead1";

Perl : Extract an HTML element with a particular class using HTML::TokeParser

Answers (1)

Related Questions