GalB1t
GalB1t

Reputation: 265

Extracting the first table's first row

I'm trying to extract the first table row (tr) of the first table (table) object in a parsed XML document.

I thought that the following will do the trick:

//table[1]//tr[1]//text()

Yet it returns too many nodes, for example in this page I wish to return:

Wikimedia Commons has media related to 
Public transport schedules

but the text of the following node which is clearly not part of the first row also returns:

<div style="font-size:110%"><a href="/wiki/Public_transport" title="Public transport">Public transport</a></div>

(only the text appears yet I patch the full node so it will be easier to find it)

Upvotes: 1

Views: 1680

Answers (2)

Ian Roberts
Ian Roberts

Reputation: 122364

This is a subtlety of the way // is defined - //table[1] does not mean "the first table" but rather "every table that is the first table element in its respective parent". The same applies to the tr step - you'll get the first row in the thead and the first row in the tbody.

If you want the first row of the first table in the whole document you need to use parentheses:

(//table//tr)[1]

This says "find all rows in all tables, then from that list select just the first element in document order".

Upvotes: 4

Dillon
Dillon

Reputation: 106

You need to extract the text from TD not tr.

Give this a try.

//table[1]//tr[1]//td//text()

Upvotes: 0

Related Questions