Shang Zhang
Shang Zhang

Reputation: 289

Why does look_down method in HTML::Element fail to find <section> elements?

The code below shows that TreeBuilder method look_down cannot find the "section" element. Why?

use strict;
use warnings;
use HTML::TreeBuilder;

my $html =<<'END_HTML';
<html>
<head><title></title></head>
<body>
<div attrname="div">
<section attrname="section">
</section>
</div>
</body>
</html>
END_HTML

my $tree = HTML::TreeBuilder->new_from_content($html);

my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

$tree->delete();

Output: number of div elements found = 1 number of section elements found = 0

Upvotes: 1

Views: 538

Answers (2)

H&#229;kon H&#230;gland
H&#229;kon H&#230;gland

Reputation: 40778

This worked for me:

my $tree = HTML::TreeBuilder->new;
$tree->ignore_unknown(0);  # <-- Include unknown elements in tree
$tree->parse($html);
my @divs = $tree->look_down('attrname', 'div');
my @sections = $tree->look_down('attrname', 'section');
print "number of div elements found = ", scalar(@divs), "\n";
print "number of section elements found = ", scalar(@sections), "\n";

Output:

number of div elements found = 1
number of section elements found = 1

Upvotes: 3

Jim Garrison
Jim Garrison

Reputation: 86774

my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

This found one element because it matched the attribute attrname with value div that happened to be on <div> tag.

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

This matches nothing because there's no tag with an attribute named attrname with value section.

They should be

my @divs = $tree->look_down(_tag => 'div');
...
my @sections = $tree->look_down(_tag => 'section');

This is all somewhat obtusely explained in the HTML::Element#lookdown documentation. There's no clear explanation of what a "criteria" is, and you'd have to read the entire page to find the pseudo-attribute _tag to refer to the tag name... but then carefully reading the entire page would probably save you hours of frustration in the long run :-)

Upvotes: 3

Related Questions