Sabbio
Sabbio

Reputation: 19

xpath that exclude some specific elements

This is a simple version of the HTML of the page that I want analyse:

<table class="class_1">
  <tbody>
    <tr class="class_2">
      <td class="class_3">&nbsp;</td>
      <td class="class_4">&nbsp;</td>
      <td class="class_5">&nbsp;</td>
    </tr>
    <tr class="class_2">
      <td class="class_3">&nbsp;</td>
      <td class="class_4">&nbsp;</td>
      <td class="class_5"><span class="class_6"></span>square</td>
    </tr>
    <tr class="class_2">
      <td class="class_3">&nbsp;</td>
      <td class="class_4">&nbsp;</td>
      <td class="class_5"><span class="class_7"></span>circle</td>
    </tr>
    <tr class="class_2">
      <td class="class_3">&nbsp;</td>
      <td class="class_4">&nbsp;</td>
      <td class="class_5"><span class="class_6"></span>triangle</td>
    </tr>
  </tbody>
</table>

You can find the page at https://sabbiobet.netsons.org/test.html

If you try in a google sheets the function:

=IMPORTXML("https://sabbiobet.netsons.org/test.html";"//td[@class='class_5']")

i'll obtain:

I need to obtain all the <td> with class="class_5" minus the ones that have &nbsp; or <span class=class_7>.

In other words I want to obtain only these values:

can somebody help me?

Upvotes: 1

Views: 765

Answers (2)

Alexey R.
Alexey R.

Reputation: 8676

This should work:

//td[@class='class_5'][not(text()=' ')][not(./span[@class='class_7'])]

where [not(text()=' ')] is not testing for a reqular space but rather for a symbol with Unicode code U+00A0 that you can input from keyboard in windows using alt+0160 where numbers are to be input from numpad.

Upvotes: 1

Markus
Markus

Reputation: 3317

The following XPath expression

//td[@class='class_5' and span and not(span[@class='class_7'])]

selects all td elements having an attribute class with value class_5, having a child element span and not having a child element span where its class attribute has the value class_7.

Note that you could also use

//td[@class='class_5' and span[@class='class_6']]

to get the same result in this case.

Upvotes: 1

Related Questions