Reputation: 113
What expression should I use to find all td
nodes after the one, which contains text Foo
or Bar
and stop before the next <td colspan="4">
with unknown text. Thanks.
<td colspan="4">Foo || Bar</td>
<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>
...
<td colspan="4">VARIABLE</td>
...
UPDATE:
use strict;
use warnings;
use autodie;
use utf8;
use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
my $url = 'www.perl.org';
my $mech = WWW::Mechanize->new;
$mech->agent_alias( 'Windows Mozilla' );
$mech->get( $url );
my $tree= HTML::TreeBuilder::XPath->new;
$tree->parse($mech->content);
for my $nodes ($tree->findnodes('//td[
preceding-sibling::td
[contains(., "Foo") or contains(., "Bar")]
and following-sibling::td[@colspan="4"]
]')) {
print $nodes->as_text;
}
Upvotes: 2
Views: 1752
Reputation: 56212
You can use this XPath:
//td[
preceding-sibling::td
[contains(., 'Foo') or contains(., 'Bar')]
and following-sibling::td[@colspan = 4]
]
It will return:
<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>
Upvotes: 1
Reputation: 167716
Well with XPath 2.0 and XQuery 1.0 there are the operators <<
and >>
that are helpful to express conditions like you have e.g. with XQuery you can nicely write
let $tr := <tr>
<td colspan="4">Foo || Bar</td>
<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>
.....
<td colspan="4">VARIABLE</td>
</tr>
let $td1 := $tr/td[contains(., 'Foo') or contains(., 'Bar')][1]
let $td2 := $td1/following-sibling::td[@colspan = 4][1]
return $tr/td[. >> $td1 and . << $td2]
to find the 'td
elements "between" those two other td
elements.
Obviously with XPath 2.0 you don't have the let and return so you will need to try to stick everything into a single expression:
$tr/td[. >> $tr/td[contains(., 'Foo') or contains(., 'Bar')][1] and . << $tr/td[contains(., 'Foo') or contains(., 'Bar')][1]/following-sibling::td[@colspan = 4][1]]
where $tr
is the context node.
Upvotes: 0