Danny Sullivan
Danny Sullivan

Reputation: 3844

Parsing Nested Elements HTML::TreeBuilder

I have an example html:

<div>
    <p>get this</p>
</div>
<p>not this</p>

Is there a way to get the nested element using HTML::TreeBuilder and look_down? I can use look_down on the resulting element of the first search.

my $tree = HTML::TreeBuilder->new;
$tree->parse("<div><p>get this</p></div><p>not this</p>");
my $div = $tree->look_down(_tag => "div");
my $p = $div->look_down(_tag => "p");
print $p->as_text() . "\n";

Is it possible to get this in a single search, similar to the css selector div p? Am I limited to XPath?

Upvotes: 0

Views: 452

Answers (1)

choroba
choroba

Reputation: 241918

You can look_up form all p's to see whether they're contained in a div:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use HTML::TreeBuilder;

sub paragraph_whose_ancestor_is_div {
    my $node = shift;
    return 'p' eq $node->{_tag} && $node->look_up(_tag => 'div')
}

my $tree = 'HTML::TreeBuilder'->new;
$tree->parse("<html><div><p>get this</p></div><p>not this</p></html>");

my @p = $tree->look_down(\&paragraph_whose_ancestor_is_div);

say $_->as_text() for @p;

Upvotes: 1

Related Questions