Reputation: 478
I am trying to parse an HTML file through my perl script. I am using a module called HTML::TreeBuilder.
Here is what I have so far:
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new;
$tree->parse_file("sample.html");
foreach my $anchor ($tree->find("p")) {
print $anchor->as_text, "\n";
}
It is working fine. I am getting everything inside < p>
tag.
sample.html file:
< td>Release Version:< /td>< td> 5134< /td>< /tr>
< tr class="d0">< td>Executed By:< /td>< td>spoddar< /td>< /tr>
< tr class="d1">< td> Duration:< /td>< td>0 Hrs 0 Mins 0 Secs < /td>< /tr>
< tr class="d0">< td>#TCs Executed:< /td>< td>1< /td>< /tr>
I want 5134
to be printed when i pass Release Version.
In the same way I want spoddar
to be printed when i pass Execute By.
These are not HTML tags. But is there any way to obtain this?
Upvotes: 1
Views: 4305
Reputation: 71
HTML::Parser and HTML::TokeParser may also be of use to you.
UNTESTED
use HTML::TokeParser;
my $p = HTML::TokeParser->new('sample.html');
while (my $token = $p->get_token) {
my $tokenType = shift @{$token}; # 'S' is start tag 'E' end tag etc. (see doc)
if ($tokenType =~ /S/) {
my ($tag, $attr, $attrseq, $rawtxt) = @{$token};
my $class = $attr->{class}; #get tag class
if ($class =~ /d0/ && $tag =~ /tr/) {
print "$p->get_trimmed_text('/tr')\n";
}
}
}
Upvotes: 2
Reputation: 6798
The most straightforward thing to do is to filter the tags you want and look through the text. The following approach assumes the format you have in the sample, with a 2-column table.
sub get_value {
my $key = shift;
foreach my $tr ($tree->find('tr')) {
my @td = $tree->find('td');
return $td[1]->as_text if $td[0]->as_text eq $key;
}
return;
}
print get_value('Release Version:');
Upvotes: 3