Reputation: 177
Hi below is my code which is not providing expected result.
First it should provide complete html content of page using cURL
then using regexp which is providing expected result when I provide them direct htmlcontent
but not providing same result using curl.
Suppose When I pass below content to htmlcontent
variable then RegExp
providing proper result.
$htmlContent = '<table id="ctl00_pageContent_ctl00_productList" class="product-list" cellspacing="0" border="0" style="width:100%;border-collapse:collapse;">
<tr>
<td class="product-list-item-container" style="width:100%;">
<div class="product-list-item" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_pageContent_ctl00_productList_ctl00_imbAdd')">
<a href="/W10542314D/WDoorGasketandLatchSt.aspx">
<img class="product-list-img" src="/images/products/display/applianceparts.jpg" title="W10542314 D/W Door Gasket & Latch St " alt="W10542314 D/W Door Gasket & Latch St " border="0" />
</a>
<div class="product-list-options">
<h5><a href="/W10542314D/WDoorGasketandLatchSt.aspx">W10542314 D/W Door Gasket & Latch St</a></h5>
<div class="product-list-cost"><span class="product-list-cost-label">Online Price:</span> <span class="product-list-cost-value">$33.42</span></div>
</div>
';
Below is my complete code -
<?php
$url = "http://www.universalapplianceparts.com/search.aspx?find=W10130694";
$ch1= curl_init();
curl_setopt ($ch1, CURLOPT_URL, $url );
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch1,CURLOPT_VERBOSE,1);
curl_setopt($ch1, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)');
curl_setopt ($ch1, CURLOPT_REFERER,'http://www.google.com'); //just a fake referer
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch1,CURLOPT_POST,0);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, 20);
$htmlContent= curl_exec($ch1);
echo $htmlContent;
$value=preg_match_all('/.*<div.*class=\"product\-list\-options\".*>.*<a href="(.*)">.*<\/a>.*<\/div>/s',$htmlContent,$matches);
print_r($matches);
$value=preg_match_all('/.*<div.*class=\"product\-list\-item\".*>.*<a href=\"(.*)\">.*<img.*>.*<\/div>/s',$htmlContent,$matches);
print_r($matches);
In this code it echo htmlcontent of webpage then with regexp it should return href
of anchor tag between div which class name is product-list-options
and product-list-item
Current output is -
http://www.universalapplianceparts.com/termsofservice.aspx
Expected output in array value - /W10130694LatchAssyWhiteHandle.aspx
Any help would be appreciated.
Thanks
Upvotes: 2
Views: 49
Reputation: 2557
Try this
class="product-list-item".*?<a href="(.*?)".*?class="product-list-options"
Output
MATCH 1
1. [23040-23075] `/W10130694LatchAssyWhiteHandle.aspx`
Explanation:
class="product-list-item"
matches class="product-list-item"
.*?
matches any character, as few times as possible
<a href="
matches <a href="
href="(.*?)"
captures text inside href=""
class="product-list-options"
matches class="product-list-options"
Upvotes: 2