Reputation: 251
I'm trying to extract the HTML content present in < td > tags corresponding to the class "tablehead1".
< td class="tablehead1"> Market < /td >
While parsing, i'm getting all the text contents of < td > tags present in the whole html file. But I need only the content in < td > tags with the particular class "tablehead1" .
Where am i going wrong in the below code ?
use HTML::TokeParser;
open(DATA,"<KeyStats.html") or die "Can't open data";
my $p = HTML::TokeParser->new(*DATA);
while (my $token = $p->get_tag('td')) {
my $url = $token->[1]{class} || "tablehead1";
my $text = $p->get_trimmed_text("/td");
if (length($text)<30&&length($text)>0) { print "$text\n"; }
}
Upvotes: 1
Views: 245
Reputation: 2334
You don't really perform the check whether the class
is really tablehead1
.
Replace
my $url = $token->[1]{class} || "tablehead1";
by
next unless $token->[1]{class} eq "tablehead1";
should give you the expected results. Also, you should add a check whether the actual <td>
really has a key class
, e.g. by
next unless grep( /^class$/, @{$token->[2]} ) && $token->[1]{class} eq "tablehead1";
Upvotes: 1