Reputation: 8497
I am using WWW::Mechanize, HTML::TreeBuilder and HTML::Element in my perl-script to navigate through html-Documents.
I want to know how to search for an element, that contains a certain string as text.
Here is an example of an html-document:
<html>
<body>
<ul>
<li>
<div class="red">Apple</div>
<div class="abc">figure = triangle</div>
</li>
<li>
<div class="red">Banana</div>
<div class="abc">figure = square</div>
</li>
<li>
<div class="green">Lemon</div>
<div class="abc">figure = circle</div>
</li>
<li>
<div class="blue">Banana</div>
<div class="abc">figure = line</div>
</li>
</ul>
</body>
</html>
I want to extract the text square
. To get it, I have to search for an element with this properties:
Then I need to get it's parent (a <li>
-element), and from the parent the child who's text starts with figure =
, but this, and the rest, is easy.
I tried it this way:
use strict;
use warnings;
use utf8;
use Encode;
use WWW::Mechanize;
use HTML::TreeBuilder;
use HTML::Element;
binmode STDOUT, ":utf8";
my $mech = WWW::Mechanize->new();
my $uri = 'http.....'; #URI of an existing html-document
$mech->get($uri);
if (($mech->success()) && ($mech->is_html())) {
my $resp = $mech->response();
my $cont = $resp->decoded_content;
my $root = HTML::TreeBuilder->new_from_content($cont);
#this works, but returns 2 elements:
my @twoElements = $root->look_down('_tag' => 'div', 'class' => 'red');
#this returns an empty list:
my @empty = $root->look_down('_tag' => 'div', 'class' => 'red', '_content' => 'Banana');
# do something with @twoElements or @empty
}
What must I use instead the last command to get the wanted element?
I am not looking for a workaround (I've found one). What I want to have is a native function of WWW::Mechanize, HTML::Tree or any other cpan-modul.
Upvotes: 2
Views: 1171
Reputation: 3484
here's psuedocode/unttested Perl:
my @twoElements = $root->look_down('_tag' => 'div', 'class' => 'red');
foreach my $e ( @twoElements ) {
next unless $e->content_list->[0] eq 'Banana';
my $e2 = $e->right; # get the sibling - might need to try left() depending on ordering
my ($shape) = $e2->content_list->[0] =~ /figure = (.+)/;
# do something with shape...
}
Not perfect, but it should get you started, and it's general enough to reuse easily. otherwise replace
($shape) = $e2->content_list->[0] =~ /figure = (.+)/;
with something like
$shape = 'square' if $e2->content_list->[0] =~ /square/;
This might be a little cleaner:
my @elements = $root->look_down('_tag' => 'div', 'class' => 'red' ); foreach my $e ( @elements ) { next unless $e->as_trimmed_text eq 'Banana'; my $e2 = $e->right; my ($shape) = $e2->as_trimmed_text =~ /figure = (.+)/;
# do something with shape...
}
Upvotes: 0