Michael
Michael

Reputation: 23

PHP XPath substring-after only returning first result

I am doing some HTML scraping and have hit a wall with this one query. I am trying to return a set of values from the following HTML page structure:

<div id="product-grid">
    <ul>
        <li><div class="price">Cash Price: $20.00</div></li>
        <li><div class="price">Cash Price: $30.00</div></li>
        <li><div class="price">Cash Price: $40.00</div></li>
    </ul>
</div>

I am trying to get the "$20.00" prices returned in a list. If I use the following XPath:

id('product-grid')//p[@class="price"] 

I get a result list of all the "Cash Price: $40.00". If I try the following query:

substring-after(id('product-grid')//p[@class="price"] , "Price: ")

I get the correct output, but only get the first result. Anyone know how I can get all results?

I am running PHP5.3.3 with libxml 2.7.8 for the XPath. I am calling the xpath as follows:

$xpath = new DOMXPath( $html ); 
$resultset= $xpath->query($query);

I have been googling like mad trying to find out why this is happening! Please help!

Upvotes: 2

Views: 1664

Answers (3)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243579

The wanted processing cannot be specified just as a single XPath 1.0 expression, because by definition any function that expects a single string argument but is given a node-set, takes the string value of the first only (in document order) node of this node-set.

Also, unlike XPath 2.0 in XPath 1.0 it isn't allowed to specify a function call as a location step.

Therefore, one solution is to issue this XPath expression:

substring-after((id('product-grid')//p[@class="price"])[$k], "Price: ") 

N times, substituting $k in each expression with 1,2,..., N, where N is the result of evaluating another XPath expression:

count(id('product-grid')//p[@class="price"])

Using XPath 2.0 one can do this with this simple and single expression:

id('product-grid')//p[@class="price"]/substring-after(., "Price: ")

which when evaluated produces exactly the wanted sequence of strings.

Upvotes: 1

Tom
Tom

Reputation: 1711

You have to use substring after getting your list.

 id('product-grid')//div[@class="price"][substring-after(., 'Price: ')]

This should work.

EDIT : This seems to be working. However I can't test the return value as I don't know how to get the substring'd value. What do you use ?

Upvotes: 1

Stefan Gehrig
Stefan Gehrig

Reputation: 83672

Sorry, but I don't think that this is possible in one step. As far as I know XPath 1.0 does not support function calls at the end of an XPath path. The answer here indicates the same.

Furthermore you must not use id('product-grid') as the first path part because the id is on the root element and does not need to be selected specially. If your sample XML is just a fragment of a larger XML document, the id() might be necessary though.

The following works as expected:

$xml = new DOMDocument();
$xml->loadXML('<div id="product-grid">
 <ul>
  <li><div class="price">Cash Price: $20.00</div></li>
  <li><div class="price">Cash Price: $30.00</div></li>
  <li><div class="price">Cash Price: $40.00</div></li>
</ul>
</div>');
$xpath = new DOMXPath($xml);
foreach ($xpath->query('//div[@class="price"]') as $n) {
    var_dump(substr($n->nodeValue, strpos($n->nodeValue, '$')));
}   

Upvotes: 1

Related Questions