Reputation: 265
I'm trying to extract the first table row (tr) of the first table (table) object in a parsed XML document.
I thought that the following will do the trick:
//table[1]//tr[1]//text()
Yet it returns too many nodes, for example in this page I wish to return:
Wikimedia Commons has media related to
Public transport schedules
but the text of the following node which is clearly not part of the first row also returns:
<div style="font-size:110%"><a href="/wiki/Public_transport" title="Public transport">Public transport</a></div>
(only the text appears yet I patch the full node so it will be easier to find it)
Upvotes: 1
Views: 1680
Reputation: 122364
This is a subtlety of the way //
is defined - //table[1]
does not mean "the first table" but rather "every table that is the first table element in its respective parent". The same applies to the tr
step - you'll get the first row in the thead and the first row in the tbody.
If you want the first row of the first table in the whole document you need to use parentheses:
(//table//tr)[1]
This says "find all rows in all tables, then from that list select just the first element in document order".
Upvotes: 4
Reputation: 106
You need to extract the text from TD not tr.
Give this a try.
//table[1]//tr[1]//td//text()
Upvotes: 0