thebourneid
thebourneid

Reputation: 113

Find nodes until condition using XPath

What expression should I use to find all td nodes after the one, which contains text Foo or Bar and stop before the next <td colspan="4"> with unknown text. Thanks.

<td colspan="4">Foo || Bar</td>
<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>
...
<td colspan="4">VARIABLE</td>
...

UPDATE:

use strict; 
use warnings;
use autodie;
use utf8;
use WWW::Mechanize;
use HTML::TreeBuilder::XPath;

my $url = 'www.perl.org';

my $mech = WWW::Mechanize->new;
$mech->agent_alias( 'Windows Mozilla' );
$mech->get( $url );

my $tree= HTML::TreeBuilder::XPath->new;

$tree->parse($mech->content);

for my $nodes ($tree->findnodes('//td[
                            preceding-sibling::td
                            [contains(., "Foo") or contains(., "Bar")] 
                            and following-sibling::td[@colspan="4"]
                            ]')) {

    print $nodes->as_text;

}

Upvotes: 2

Views: 1752

Answers (2)

Kirill Polishchuk
Kirill Polishchuk

Reputation: 56212

You can use this XPath:

//td[
      preceding-sibling::td
            [contains(., 'Foo') or contains(., 'Bar')] 
      and following-sibling::td[@colspan = 4]
]

It will return:

<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>

Upvotes: 1

Martin Honnen
Martin Honnen

Reputation: 167716

Well with XPath 2.0 and XQuery 1.0 there are the operators << and >> that are helpful to express conditions like you have e.g. with XQuery you can nicely write

let $tr := <tr>
<td colspan="4">Foo || Bar</td>
<td rowspan="4">TEXT1</td>
<td valign="top">TEXT2</td>
<td valign="top">TEXT3</td>
.....
<td colspan="4">VARIABLE</td>
</tr>
let $td1 := $tr/td[contains(., 'Foo') or contains(., 'Bar')][1]
let $td2 := $td1/following-sibling::td[@colspan = 4][1]
return $tr/td[. >> $td1 and . << $td2]

to find the 'td elements "between" those two other td elements.

Obviously with XPath 2.0 you don't have the let and return so you will need to try to stick everything into a single expression:

$tr/td[. >> $tr/td[contains(., 'Foo') or contains(., 'Bar')][1] and . << $tr/td[contains(., 'Foo') or contains(., 'Bar')][1]/following-sibling::td[@colspan = 4][1]]

where $tr is the context node.

Upvotes: 0

Related Questions