DrMikey
DrMikey

Reputation: 405

XPath Exclude Text From Child Element

I'm looking to get the output:

 50ml milk

From the following code:

<ul class="ingredients-list__group">
  <li>50ml <a href="/glossary/milk" class="tooltip-processed">milk
<div class="tooltip">
      <h2
        class="node-title">Milk</h2> <span class="fonetic">mill-k</span>
        <p>One of the most widely used ingredients, milk is often referred to as a complete food. While cow…</p>
        </div>
        </a>
  </li>
</ul>

Currently I'm using the XPATH:

//ul[@class="ingredients-list__group"]/li

But getting:

50ml milk Milk mill-kOne of the most widely used ingredients, milk is often referred to as a complete food. While cow… 

How do I exclude the stuff within the div/tooltip?

Upvotes: 2

Views: 2353

Answers (2)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

With xpath 2.0:

//ul[@class="ingredients-list__group"]/li/concat(./text()[1], ./a/text()[1])

With xpath 1.0:

concat(//ul[@class="ingredients-list__group"]/li/text()[1], //ul[@class="ingredients-list__group"]/li/a/text()[1])'

Upvotes: 2

Michael Kay
Michael Kay

Reputation: 163262

You can select the relevant text nodes using

//ul[@class="ingredients-list__group"]//
   text()[not(ancestor::div[@class='tooltip'])]

If you're in XPath 2.0 you can then put this in a call of string-join() to join these into a single string. If you're stuck with 1.0, you'll have to return multiple text nodes to the calling application and concatenate them together in the host language code.

Upvotes: 0

Related Questions