Reputation: 878
I cant seem to find topic which answers this so I'm asking myself.
Since this is generic question for which answer can be applied to most documents, I think specific code example is not necessary.
Using XPath I want to select all table nodes which do not nest other tables.
So no other descendant table elements, and I also want to discard all tables which have spaces only as their value.
I have tried this:
//table[not(child::table) and normalize-space(.)]
but it's not working.
What is the right way to do it?
Upvotes: 2
Views: 5195
Reputation: 107237
Assuming that you are scraping (X)HTML, and noting that table
cannot have another table as a direct child, it is likely that you are looking for descendent
table elements, and not direct child
elements.
table[not(descendant::table)]
In the Xml below:
<xml>
<table id="hasDescendent">
<tr>
<td>
<table id="Inner Descendent"/>
</td>
</tr>
</table>
<table id="directChild">
<table id="Inner Direct Child" />
</table>
<table id="nochild">
</table>
</xml>
The xpath //table[not(descendant::table)]
returns the following table
s:
Upvotes: 3
Reputation: 473763
Let's use the following HTML fragment as an example:
<div>
<table id="1">
</table>
<table id="2">
<table>
<tr>
<td>2</td>
</tr>
</table>
</table>
<table id="3">
<div>I'm the one you wanted to find</div>
</table>
</div>
According to your description, the first table
should be discarded since it contains only spaces, the second table
should be discarded also, since there is another table
inside.
The following xpath expression would match the third table
only:
/div/table[(not(child::table) and normalize-space(.))]
Demo (using xmllint
tool):
$ xmllint index.html --xpath '/div/table[(not(child::table) and normalize-space(.))]'
<table id="3">
<div>I'm the one you wanted to find</div>
</table>
Upvotes: 1