Jamie Bull
Jamie Bull

Reputation: 13539

How do exclude elements from an Xpath query?

I'm trying to select the ingredients in an ingredients list, but there are also tooltips scattered amongst them (on the BBC Good Food site).

As a stripped-down example:

<li class="ingredients-list__item" itemprop="ingredients">
  400g
  <a href="/glossary/new-potatoes" class="ingredients-list__glossary-link tooltip-processed">
    new potato
    <div id="gf-tooltip-0" class="gf-tooltip" role="tooltip">
      <div class="gf-tooltip__content">
        <div class="gf-tooltip__text">
          <p>unwanted tooltip</p>
        </div>
      </div>
    </div>
  </a>, halved if large
  <span class="ingredients-list__glossary-element" id="ingredients-glossary"></span>
</li>

I'm trying to end up with '400g new potato, halved if large', or equally good, ['400g', 'new potato', ', halved if large'].

Amongst other things I've tried:

s.xpath("//li[@class='ingredients-list__item'][not(div[@class='gf-tooltip'])]//text()").extract()

But this still returns the text in the tooltip div.

Upvotes: 1

Views: 500

Answers (1)

har07
har07

Reputation: 89325

One possible way would be excluding text nodes where any of the ancestor is a tooltip div (broken into 2 lines for readability) :

//li[@class='ingredients-list__item']
  //text()[not(ancestor::div[@class='gf-tooltip'])]

Upvotes: 3

Related Questions