Reputation: 32776
How can I send a header to a website as if PHP / Apache is a browser? I'm trying to scrape a site, but it looks like they send a 404 error if it's coming from another server...
Or, if you know any other good ways to scrape content from a site?
Also, here is my current code:
<?php
$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,$_GET['url']);
curl_setopt($curl_handle, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
curl_setopt($curl_handle, CURLOPT_REFERER, "http://google.com");
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
$buffer = curl_exec($curl_handle);
curl_close($curl_handle);
echo $buffer;
?>
so, I'll be making an AJAX request like:
/spider.php?url=http://target.com
Which returns an empty string. I know this is setup right though because if i switch target with twitter.com it works... what am i missing to make this look like a full browser?
Upvotes: 2
Views: 2837
Reputation: 11325
For cURL, there is CURLOPT_USERAGENT option for that,
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
However it may also check Referer header, which you can set via
curl_setopt($ch, CURLOPT_REFERER, "http://<somesite>");
Upvotes: 3
Reputation: 8382
If you're using the curl, you can use the CURLOPT_HTTPHEADER
option, which takes an array of headers you wish to send with the request.
If you're using file_get_contents()
, you can pass it a stream context created with stream_create_context()
.
Upvotes: 2