Reputation: 289
The code below shows that TreeBuilder method look_down cannot find the "section" element. Why?
use strict;
use warnings;
use HTML::TreeBuilder;
my $html =<<'END_HTML';
<html>
<head><title></title></head>
<body>
<div attrname="div">
<section attrname="section">
</section>
</div>
</body>
</html>
END_HTML
my $tree = HTML::TreeBuilder->new_from_content($html);
my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";
my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";
$tree->delete();
Output: number of div elements found = 1 number of section elements found = 0
Upvotes: 1
Views: 538
Reputation: 40778
This worked for me:
my $tree = HTML::TreeBuilder->new;
$tree->ignore_unknown(0); # <-- Include unknown elements in tree
$tree->parse($html);
my @divs = $tree->look_down('attrname', 'div');
my @sections = $tree->look_down('attrname', 'section');
print "number of div elements found = ", scalar(@divs), "\n";
print "number of section elements found = ", scalar(@sections), "\n";
Output:
number of div elements found = 1
number of section elements found = 1
Upvotes: 3
Reputation: 86774
my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";
This found one element because it matched the attribute attrname
with value div
that happened to be on <div>
tag.
my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";
This matches nothing because there's no tag with an attribute named attrname
with value section
.
They should be
my @divs = $tree->look_down(_tag => 'div');
...
my @sections = $tree->look_down(_tag => 'section');
This is all somewhat obtusely explained in the HTML::Element#lookdown documentation. There's no clear explanation of what a "criteria" is, and you'd have to read the entire page to find the pseudo-attribute _tag
to refer to the tag name... but then carefully reading the entire page would probably save you hours of frustration in the long run :-)
Upvotes: 3