andreapavan
andreapavan

Reputation: 697

How to extract price from Amazon Url without Amazon API

I'm trying to load html file from a Amazon URL to extract the product price using a simple php function on Yii. I started to get the entire file with php function file_get_contents, and than extract only the price from my html file with DOM.

I'm using a DOM parser to read the HTML file. It has convenient functions to read the tags of a html file. This is the parser:

http://simplehtmldom.sourceforge.net/

The URL that php analyze can be of amazon.com, amazon.co.uk, amazon.it, etc. In the future this feature will be used also to analyze other url different from Amazon.

I created a simple function, that from a URL, extract the price, here it is:

public function findAmazonPriceFromUrl($url) {
    Yii::import('ext.HtmlDOMParser.*');
    require_once('simple_html_dom.php');

    $html = file_get_html($url);
    $item = $html->getElementsById('actualPriceValue');
    if ($item) {
        $price = $item[0]->firstChild()->innertext;
    } else {
        $item = $html->getElementsById('current-price');
        $price = $item[0]->innertext;
    }
    return $price;
}

The file_get_html function is the following:

function file_get_html($url) {
    $dom = new simple_html_dom();
    $contents = file_get_contents($url);
    if (empty($contents) || strlen($contents) > MAX_FILE_SIZE) {
        return false;
    }
$dom->load($contents);
return $dom;

}

I noticed that after a few request (various links), I always get an error from the server (Error 500). I checked my apache log file, but everything is good.

Amazon could block my requests after certain time? How can i fix it?

Thanks in advance for the help

Upvotes: 0

Views: 2724

Answers (1)

Rati Gvelesiani
Rati Gvelesiani

Reputation: 11

I had same problem and this is my fix: I run script again if image is not parsed. image is parsed first in my php script so I check if it works and amazon gives information. I hope it helps.

if($html->find('#main-image')) {    
   foreach($html->find('#main-image') as $e) {
      echo '<span href="'. $e->src . '" class="imgblock parseimg">
               <img src="'. $e->src . '" class="resultimg" alt="'.$name.'" title="'.$name.'">
            </span>
            <input type="hidden" name="my-item-img" value="'. $e->src . '" />';
   }
} else {
   gethtml($url,$domain);
   die;
}

Upvotes: 1

Related Questions