Reputation: 23
I am doing some HTML scraping and have hit a wall with this one query. I am trying to return a set of values from the following HTML page structure:
<div id="product-grid">
<ul>
<li><div class="price">Cash Price: $20.00</div></li>
<li><div class="price">Cash Price: $30.00</div></li>
<li><div class="price">Cash Price: $40.00</div></li>
</ul>
</div>
I am trying to get the "$20.00"
prices returned in a list. If I use the following XPath:
id('product-grid')//p[@class="price"]
I get a result list of all the "Cash Price: $40.00". If I try the following query:
substring-after(id('product-grid')//p[@class="price"] , "Price: ")
I get the correct output, but only get the first result. Anyone know how I can get all results?
I am running PHP5.3.3
with libxml 2.7.8
for the XPath
. I am calling the xpath as follows:
$xpath = new DOMXPath( $html );
$resultset= $xpath->query($query);
I have been googling like mad trying to find out why this is happening! Please help!
Upvotes: 2
Views: 1664
Reputation: 243579
The wanted processing cannot be specified just as a single XPath 1.0 expression, because by definition any function that expects a single string argument but is given a node-set, takes the string value of the first only (in document order) node of this node-set.
Also, unlike XPath 2.0 in XPath 1.0 it isn't allowed to specify a function call as a location step.
Therefore, one solution is to issue this XPath expression:
substring-after((id('product-grid')//p[@class="price"])[$k], "Price: ")
N
times, substituting $k
in each expression with 1,2,..., N
, where N
is the result of evaluating another XPath expression:
count(id('product-grid')//p[@class="price"])
Using XPath 2.0 one can do this with this simple and single expression:
id('product-grid')//p[@class="price"]/substring-after(., "Price: ")
which when evaluated produces exactly the wanted sequence of strings.
Upvotes: 1
Reputation: 1711
You have to use substring after getting your list.
id('product-grid')//div[@class="price"][substring-after(., 'Price: ')]
This should work.
EDIT : This seems to be working. However I can't test the return value as I don't know how to get the substring'd value. What do you use ?
Upvotes: 1
Reputation: 83672
Sorry, but I don't think that this is possible in one step. As far as I know XPath 1.0 does not support function calls at the end of an XPath path. The answer here indicates the same.
Furthermore you must not use id('product-grid')
as the first path part because the id is on the root element and does not need to be selected specially. If your sample XML is just a fragment of a larger XML document, the id()
might be necessary though.
The following works as expected:
$xml = new DOMDocument();
$xml->loadXML('<div id="product-grid">
<ul>
<li><div class="price">Cash Price: $20.00</div></li>
<li><div class="price">Cash Price: $30.00</div></li>
<li><div class="price">Cash Price: $40.00</div></li>
</ul>
</div>');
$xpath = new DOMXPath($xml);
foreach ($xpath->query('//div[@class="price"]') as $n) {
var_dump(substr($n->nodeValue, strpos($n->nodeValue, '$')));
}
Upvotes: 1