Reputation: 697
I'm trying to load html file from a Amazon URL to extract the product price using a simple php function on Yii.
I started to get the entire file with php function file_get_contents
, and than extract only the price from my html file with DOM.
I'm using a DOM parser to read the HTML file. It has convenient functions to read the tags of a html file. This is the parser:
http://simplehtmldom.sourceforge.net/
The URL that php analyze can be of amazon.com, amazon.co.uk, amazon.it, etc. In the future this feature will be used also to analyze other url different from Amazon.
I created a simple function, that from a URL, extract the price, here it is:
public function findAmazonPriceFromUrl($url) {
Yii::import('ext.HtmlDOMParser.*');
require_once('simple_html_dom.php');
$html = file_get_html($url);
$item = $html->getElementsById('actualPriceValue');
if ($item) {
$price = $item[0]->firstChild()->innertext;
} else {
$item = $html->getElementsById('current-price');
$price = $item[0]->innertext;
}
return $price;
}
The file_get_html
function is the following:
function file_get_html($url) {
$dom = new simple_html_dom();
$contents = file_get_contents($url);
if (empty($contents) || strlen($contents) > MAX_FILE_SIZE) {
return false;
}
$dom->load($contents);
return $dom;
}
I noticed that after a few request (various links), I always get an error from the server (Error 500). I checked my apache log file, but everything is good.
Amazon could block my requests after certain time? How can i fix it?
Thanks in advance for the help
Upvotes: 0
Views: 2724
Reputation: 11
I had same problem and this is my fix: I run script again if image is not parsed. image is parsed first in my php script so I check if it works and amazon gives information. I hope it helps.
if($html->find('#main-image')) {
foreach($html->find('#main-image') as $e) {
echo '<span href="'. $e->src . '" class="imgblock parseimg">
<img src="'. $e->src . '" class="resultimg" alt="'.$name.'" title="'.$name.'">
</span>
<input type="hidden" name="my-item-img" value="'. $e->src . '" />';
}
} else {
gethtml($url,$domain);
die;
}
Upvotes: 1