Zaffar Saffee
Zaffar Saffee

Reputation: 6305

Exclude tag based on class and style in xpath

i have the following code to for xpath query...

<div class="buying">


<h1 class="parseasinTitle ">

<span id="btAsinTitle">Top Ten Tips for Growing Your Own Tomatoes (The Basic Art of Italian Cooking) <span style="text-transform: capitalize; font-size: 16px;">[Kindle Edition]</span></span>


</h1>
</div>

i just want to extract

Top Ten Tips for Growing Your Own Tomatoes (The Basic Art of Italian Cooking)

so i am using textContent with the following xpath query

$xpath_books->query('//span[@id="btAsinTitle"]')

but the result is

Top Ten Tips for Growing Your Own Tomatoes (The Basic Art of Italian Cooking) [Kindle Edition]

i think, i have to exclude <span style="text-transform: capitalize; font-size: 16px;"> , to get my purpose, how can i do it ?

Upvotes: 2

Views: 578

Answers (2)

Gordon
Gordon

Reputation: 317177

Your XPath does return the node with the id only, but because DOM is a tree of linked DOMNodes, the returned node will contain the child node. And when you access the returned span with nodeValue or textContent, PHP will return the combined DOMText nodes of all the children, including the child span holding "Kindle Edition".

      SPAN
     /    \
   TEXT   SPAN
            \
            TEXT

More on that at DOMDocument in php

If you want to fetch only the first text part, you have to fetch the nodeValue of the first childNode:

echo $result->item(0)->childNodes->item(0)->nodeValue;

An alternative to fetch that string with XPath directly would be

echo $xpath->evaluate('string(//span[@id="btAsinTitle"]/text())');

See http://php.net/manual/en/domxpath.evaluate.php

If you want to return the whole DOMText node instead, use

//span[@id="btAsinTitle"]/text()

Upvotes: 4

Kirill Polishchuk
Kirill Polishchuk

Reputation: 56202

Use this XPath:

//span[@id="btAsinTitle"]/text()

Upvotes: 4

Related Questions