Reputation: 470
I want to fetch some contents of site, so I am using file_get_contents or curl function in php. But the problem is that these functions are not working for every site, eg: they are working for google.com, but not working for iteye.com. my code likes below:
$baseurl = 'http://www.iteye.com/';
$contents = file_get_contents($baseurl);
//OR
$ch = curl_init();
$timeout = 10;
curl_setopt ($ch, CURLOPT_URL, $baseurl);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$list = curl_exec($ch);
I guess this site blocked the functions (file_get_contents or curl), so how can I continue fetch contents from these sites like iteye.com ?
Upvotes: 1
Views: 1323
Reputation: 66
If you would like to fetch any site, I would recommend you to use CURL
You must pay attention to:
You must behave as much as possible like a human being.
Therefore these directives may not missing in your code also:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
Upvotes: 1
Reputation: 4173
You may need to instruct curl to follow redirects, and also to change the user agent:
$ch = curl_init();
$timeout = 10;
curl_setopt ($ch, CURLOPT_URL, $baseurl);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt ($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$list = curl_exec($ch);
Upvotes: 0