Reputation: 55
I'm trying to get the product images of a website using this piece of code:
<?php
$url="http://www.akasa.com.tw/update.php?tpl=product/cpu.gallery.tpl&type=Fanless Chassis&type_sub=Fanless Mini ITX&model=A-ITX19-A1B";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, "User-Agent: Mozilla/6.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.7) Gecko/20050414 Firefox/1.0.3");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_ENCODING, "");
$pagebody=curl_exec($ch);
curl_close ($ch);
$html=str_get_html($pagebody);
print_r($html);
PHPStorm lets me read the variables and $pagebody got this value:
<html><head><title>Request Rejected</title></head><body>The requested URL was rejected. If you think this is an error, please contact the webmaster. <br><br>Your support ID is: 4977197659118049932</body></html>
When I use a browser I perfectly see the page and the pagesource also gives me all nice info I need but I would like to automate scraping some images from it. Any idea how to find out what info I need to send with cURL so that the website isn't see me as a robot (I guess that is the problem) or how to find out the solution for such problems?
Upvotes: 2
Views: 2844
Reputation: 4560
Basically you need to encode your query string arguments so all special characters will be properly represented into url. You can use http_build_query
for this purpose so your url construction may look something like this:
$url = implode('?', [
'http://www.akasa.com.tw/update.php',
http_build_query([
'tpl' => 'product/cpu.gallery.tpl',
'type' => 'Fanless Chassis',
'type_sub' => 'Fanless Mini ITX',
'model' => 'A-ITX19-A1B',
])
]);
and then the rest of your code.
Upvotes: 4