Maurice69
Maurice69

Reputation: 55

cURL scraping gives me 'Request Rejected' The requested URL was rejected

I'm trying to get the product images of a website using this piece of code:

<?php

$url="http://www.akasa.com.tw/update.php?tpl=product/cpu.gallery.tpl&type=Fanless Chassis&type_sub=Fanless Mini ITX&model=A-ITX19-A1B";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, "User-Agent: Mozilla/6.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.7) Gecko/20050414 Firefox/1.0.3");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_ENCODING, "");
$pagebody=curl_exec($ch);

curl_close ($ch);

$html=str_get_html($pagebody);

print_r($html);

PHPStorm lets me read the variables and $pagebody got this value:

<html><head><title>Request Rejected</title></head><body>The requested URL was rejected. If you think this is an error, please contact the webmaster. <br><br>Your support ID is: 4977197659118049932</body></html>

http://www.akasa.com.tw/update.php?tpl=product/cpu.gallery.tpl&type=Fanless Chassis&type_sub=Fanless Mini ITX&model=A-ITX19-A1B

When I use a browser I perfectly see the page and the pagesource also gives me all nice info I need but I would like to automate scraping some images from it. Any idea how to find out what info I need to send with cURL so that the website isn't see me as a robot (I guess that is the problem) or how to find out the solution for such problems?

Upvotes: 2

Views: 2844

Answers (1)

Flying
Flying

Reputation: 4560

Basically you need to encode your query string arguments so all special characters will be properly represented into url. You can use http_build_query for this purpose so your url construction may look something like this:

$url = implode('?', [
    'http://www.akasa.com.tw/update.php',
    http_build_query([
        'tpl'      => 'product/cpu.gallery.tpl',
        'type'     => 'Fanless Chassis',
        'type_sub' => 'Fanless Mini ITX',
        'model'    => 'A-ITX19-A1B',
    ])
]);

and then the rest of your code.

Upvotes: 4

Related Questions