John
John

Reputation: 177

RegExp Not providing expected result with cURL

Hi below is my code which is not providing expected result.

First it should provide complete html content of page using cURL then using regexp which is providing expected result when I provide them direct htmlcontent but not providing same result using curl.

Suppose When I pass below content to htmlcontent variable then RegExp providing proper result.

$htmlContent = '<table id="ctl00_pageContent_ctl00_productList" class="product-list" cellspacing="0" border="0" style="width:100%;border-collapse:collapse;">
                    <tr>
                        <td class="product-list-item-container" style="width:100%;">
        <div class="product-list-item" onkeypress="javascript:return WebForm_FireDefaultButton(event, &#39;ctl00_pageContent_ctl00_productList_ctl00_imbAdd&#39;)">
                                        <a href="/W10542314D/WDoorGasketandLatchSt.aspx">
              <img class="product-list-img" src="/images/products/display/applianceparts.jpg" title="W10542314 D/W Door Gasket & Latch St  " alt="W10542314 D/W Door Gasket & Latch St  " border="0" />
            </a>
                <div class="product-list-options">
          <h5><a href="/W10542314D/WDoorGasketandLatchSt.aspx">W10542314 D/W Door Gasket &amp; Latch St</a></h5>
 <div class="product-list-cost"><span class="product-list-cost-label">Online Price:</span> <span class="product-list-cost-value">$33.42</span></div>
                                  </div>
'; 

Below is my complete code -

<?php
$url = "http://www.universalapplianceparts.com/search.aspx?find=W10130694";
$ch1= curl_init();
curl_setopt ($ch1, CURLOPT_URL, $url );
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch1,CURLOPT_VERBOSE,1);
curl_setopt($ch1, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)');
curl_setopt ($ch1, CURLOPT_REFERER,'http://www.google.com');  //just a fake referer
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch1,CURLOPT_POST,0);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, 20);

$htmlContent= curl_exec($ch1);
echo $htmlContent;


$value=preg_match_all('/.*<div.*class=\"product\-list\-options\".*>.*<a href="(.*)">.*<\/a>.*<\/div>/s',$htmlContent,$matches);
print_r($matches);

$value=preg_match_all('/.*<div.*class=\"product\-list\-item\".*>.*<a href=\"(.*)\">.*<img.*>.*<\/div>/s',$htmlContent,$matches);
print_r($matches);

In this code it echo htmlcontent of webpage then with regexp it should return href of anchor tag between div which class name is product-list-options and product-list-item

Current output is -

http://www.universalapplianceparts.com/termsofservice.aspx

Here Regexp reading my html content from cURL in reverse order and returning first href value in anchor tag.

Expected output in array value - /W10130694LatchAssyWhiteHandle.aspx

Any help would be appreciated.

Thanks

Upvotes: 2

Views: 49

Answers (1)

Tim007
Tim007

Reputation: 2557

Try this

class="product-list-item".*?<a href="(.*?)".*?class="product-list-options"

Demo

Output

MATCH 1
1.  [23040-23075]   `/W10130694LatchAssyWhiteHandle.aspx`

Explanation:

class="product-list-item" matches class="product-list-item"
.*? matches any character, as few times as possible
<a href=" matches <a href="
href="(.*?)" captures text inside href=""
class="product-list-options" matches class="product-list-options"

Upvotes: 2

Related Questions