Reputation: 4099
With the following markup I need to get the middle tr's
<tr class="H03">
<td>Artist</td>
...
<tr class="row_alternate">
<td>LIMP</td>
<td>Orion</td>
...
</tr>
<tr class="row_normal">
<td>SND</td>
<td>Tender Love</td>
...
</tr>
<tr class="report_total">
<td> </td>
<td> </td>
...
</tr>
That is every sibling tr between <tr class="H03">
and <tr class="report_total">
. I'm scraping using mechanize and nokogiri, so am limited to their xpath support. My best attempt after looking at various StackOverflow questions is
page.search('/*/tr[@class="H03"]/following-sibling::tr[count(. | /*/tr[@class="report_total"]/preceding-sibling::tr)=count(/*/tr[@class="report_total"]/preceding-sibling::tr)]')
which returns an empty array, and is so ridiculously complicated that my limited xpath fu is completely overwhelmed!.
Upvotes: 1
Views: 605
Reputation: 89295
You can try the following xpath :
//tr[@class='H03']/following-sibling::tr[following-sibling::tr[@class='report_total']]
Above xpath select all <tr>
following tr[@class='H03']
, where <tr>
have following sibling tr[@class='report_total']
or in other words selected <tr>
are located before tr[@class='report_total']
.
Upvotes: 2
Reputation: 327
Mechanize has a few helper methods here that would be useful to employ.
presuming you are doing something like the following:
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.website.com')
start_tr = page.at('.H03')
At this point, tr will be a nokogiri xml element of the first tr you list in your question.
You can then iterate through siblings with:
next_tr = start_tr.next_sibling
Do this until you hit the tr at which you want to stop.
trs = Array.new
until next_tr.attributes['class'].name == 'report_total'
next_tr = next_tr.next_sibling
trs << next_tr
end
If you want the range to be inclusive of the start and stop trs (H03 and report_total) just tweak the code above to include them in the trs array.
Upvotes: 1