Reputation: 2409
I am trying to retrieve and echo the content of a div from an external site using PHP and xPath.
This is an excerpt from the page, showing the relevant code:
<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head><title>Handbags - Clutches - Kara Ross New York</title></head>
<body>
<div id="Container">
<div id="AjaxLoading">...</div> ...
<div id="Wrapper">
<div class="productlist-page"> ...
<div class="Content Wide " id="LayoutColumn1"> ...
<div align="center">
<div class="Block CategoryContent Moveable Panel" id="CategoryContent">
<form name="frmCompare" id="frmCompare">
<table><tr><td valign="top">...</td>
<td valign="top">
<ul class="ProductList ">
<li class="Odd">
<div class="ProductImage QuickView" data-product="261">
<a href="http://www.kararossny.com/electra-clutch-in-oil-spill-lizard-and-hologram-with-gunmetal-hardware-and-hematite/">
<img src="http://cdn2.bigcommerce.com/n-arxsrf/t0qdc/products/261/images/1382/electra_oil_spill__08182.1402652812.500.375.jpg?c=2" alt="Kara Ross Electra Clutch in Oil Spill Lizard and Hologram with Gunmetal Hardware and Hematite Gemstone on Closure"/>
</a>
</div>
<div class="ProductDetails">...</div>
<div class="ProductPriceRating">...</div>
<div class="ProductCompareButton" style="display:none">...</div>
<div class="ProductActionAdd" style="display:none;">...</div>
</li>
</ul>
</td>
<td valign="top" align="center">...</td>
</tr>
</table>
<div class="product-nav btm"> ... </div>
</form>
...
This is my code so far:
$url = 'http://www.kararossny.com/clutches/?sort=featured&page=1';
$dom = new DOMDocument;
@$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//div[class="ProductImage QuickView"]');
foreach($elements[0] as $child) {
echo $child . "\n";
}
My desired output for the page linked would be:
<a href="http://www.kararossny.com/electra-clutch-in-oil-spill-lizard-and-hologram-with-gunmetal-hardware-and-hematite/">
<img src="http://cdn2.bigcommerce.com/n-arxsrf/t0qdc/products/261/images/1382/electra_oil_spill__08182.1402652812.500.375.jpg?c=2" alt="Kara Ross Electra Clutch in Oil Spill Lizard and Hologram with Gunmetal Hardware and Hematite Gemstone on Closure"/>
</a>
Any idea what I am doing wrong? I think my xpath might be wrong, but I am not sure.
Thanks!
Upvotes: 2
Views: 2539
Reputation: 23627
There are three reasons why you are probably not being able to select the code you want.
1 - To select your class
attribute in your XPath predicate you need to use the attribute axis. Either prefix the attribute name with attribute::
or with an @
sign. So you should use
@class
to select the class attribute.
2 - An XPath expression is made of one or more steps. Each step defines a context that limits the scope of the next step. The last step contains the set you are selecting. Since your last step is a div
, you are actually selecting a div
, and not an a
. You should use the following expression to select the a
node and its contents:
//div[@class="ProductImage QuickView"]/a
3 - Finally, your page has a default namespace declaration:
xmlns="http://www.w3.org/1999/xhtml"
That will require you to either register it or ignore it selecting your elements using wildcards (not by their names, but using *
). Most XPath APIs do not automatically set default namespaces, and if a namespace is not used to qualify XPath selectors, it considers unprefixed elements as belonging to no namespaces. That means that if you try to select a <div>
using the expression //div
, you may get an empty set. If you are not selecting anything, try ignoring namespaces like this:
//*[local-name()='div'][@class="ProductImage QuickView"]/*[local-name()='a']
Upvotes: 2
Reputation: 7948
You forgot to add @
on the class and a
at the end on your query, since to targeting the link. After that, use saveHTML()
to get it. Consider this example:
$url = 'http://www.kararossny.com/clutches/?sort=featured&page=1';
$dom = new DOMDocument();
@$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//div[@class="ProductImage QuickView"]/a');
$link = $dom->saveHTML($elements->item(0));
echo $link;
Upvotes: 4
Reputation: 89315
Yes, your XPath is a bit off.
In XPath, to filter element by it's attribute value you have to use @
at the beginning of the attribute name. So your XPath should've been as follow :
//div[@class="ProductImage QuickView"]
Upvotes: 3