Reputation: 178
I'm having a hard time figuring out how to do something that seems very simple. Let's say I have an HTML table such as the following:
<table><tbody>
<tr><th>First header</th></tr>
<tr../>
<tr../>
<tr../>
<tr><th>Second header</th></tr>
<tr../>
</tbody></table>
I want all three rows that immediately follow the "First header". So far I have '/table/tbody/tr[preceding-sibling::tr/th[1]/text()="First header"]'
, but it's giving me every single row in the table after the "First header". What am I doing wrong?
Edit: I'm working with code that passes in the header as a variable, so I'm parsing the table without knowing if there is another header at the end or what it would be. More generically, given a header string, retrieve all following rows until the next header or end of table.
Upvotes: 1
Views: 2361
Reputation: 178
I got this after more trial and error:
'/table/tbody/tr[preceding-sibling::tr[th/text()="First header"] = preceding-sibling::tr[th][1]]'
Which translates to English: get all rows preceded by the "First header" row where that row is also the first preceding row that contains a header.
Upvotes: 0
Reputation: 474221
You can get every tr
tag that has the preceding-sibling's th
text = First header
and a following-sibling contains th
tag:
//tr[preceding-sibling::tr/th = 'First header' and following-sibling::tr/th]
Demo (using xmllint
):
$ xmllint index.html --xpath "//tr[preceding-sibling::tr/th = 'First header' and following-sibling::tr/th]"
<tr>1</tr><tr>2</tr><tr>3</tr>
where index.html
contains:
<table>
<tbody>
<tr>
<th>First header</th>
</tr>
<tr>1</tr>
<tr>2</tr>
<tr>3</tr>
<tr>
<th>Second header</th>
</tr>
<tr>4</tr>
</tbody>
</table>
Upvotes: 1