diabolist
diabolist

Reputation: 4099

XPath to get siblings between two elements

With the following markup I need to get the middle tr's

<tr class="H03">
  <td>Artist</td>
  ...
<tr class="row_alternate">
  <td>LIMP</td>
  <td>Orion</td>
  ...
</tr>
<tr class="row_normal">
  <td>SND</td>
  <td>Tender Love</td>
  ...
</tr>
<tr class="report_total">
  <td>&nbsp;</td>
  <td>&nbsp;</td>
  ...
</tr>

That is every sibling tr between <tr class="H03"> and <tr class="report_total">. I'm scraping using mechanize and nokogiri, so am limited to their xpath support. My best attempt after looking at various StackOverflow questions is

page.search('/*/tr[@class="H03"]/following-sibling::tr[count(. | /*/tr[@class="report_total"]/preceding-sibling::tr)=count(/*/tr[@class="report_total"]/preceding-sibling::tr)]')

which returns an empty array, and is so ridiculously complicated that my limited xpath fu is completely overwhelmed!.

Upvotes: 1

Views: 605

Answers (2)

har07
har07

Reputation: 89295

You can try the following xpath :

//tr[@class='H03']/following-sibling::tr[following-sibling::tr[@class='report_total']]

Above xpath select all <tr> following tr[@class='H03'], where <tr> have following sibling tr[@class='report_total'] or in other words selected <tr> are located before tr[@class='report_total'].

Upvotes: 2

David Lio
David Lio

Reputation: 327

Mechanize has a few helper methods here that would be useful to employ.

presuming you are doing something like the following:

require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.website.com')
start_tr = page.at('.H03')

At this point, tr will be a nokogiri xml element of the first tr you list in your question.

You can then iterate through siblings with:

next_tr = start_tr.next_sibling

Do this until you hit the tr at which you want to stop.

trs = Array.new

until next_tr.attributes['class'].name == 'report_total'
    next_tr = next_tr.next_sibling
    trs << next_tr
end

If you want the range to be inclusive of the start and stop trs (H03 and report_total) just tweak the code above to include them in the trs array.

Upvotes: 1

Related Questions