Reputation: 1513
Here is my code :
function get_data($url)
{
$ch = curl_init();
$timeout = 15;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT, random_user_agent());
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
//Grab HTML
$urllist = fopen("links.txt", "r+");
for ($j = 0; $j <= 50; $j++)
{
$post = rtrim(fgets($urllist));
echo $post;
$html = get_data($post);
echo $html;
Problem : when I use get_data("http://url.com")
I get the right data in html. But when I pass the url using a variable, $html
returns nothing.
$post holds the right url as I checked it. Isnt it the right way to use get_data($post);
Curl info gives :
I get this :
array(20) {
["url"]=> string(68) "http://secret-url.com"
["content_type"]=> string(9) "text/html"
["http_code"]=> int(301)
["header_size"]=> int(255)
["request_size"]=> int(340)
["filetime"]=> int(-1)
["ssl_verify_result"]=> int(0)
["redirect_count"]=> int(0)
["total_time"]=> float(0.095589)
["namelookup_time"]=> float(0.012224)
["connect_time"]=> float(0.049399)
["pretransfer_time"]=> float(6.5E-5)
["size_upload"]=> float(0)
["size_download"]=> float(0)
["speed_download"]=> float(0)
["speed_upload"]=> float(0)
["download_content_length"]=> float(0)
["upload_content_length"]=> float(0)
["starttransfer_time"]=> float(0.095534)
["redirect_time"]=> float(0)
}
Upvotes: 2
Views: 3901
Reputation: 8641
will $html = file_get_contents($url);
suffice? As records show, it didnt =)
EDIT to sum up conversation with a legitimate answer;
Change your curl into following, containing the FOLLOWLOCATION directive, optionally constrain curl with MAXREDIRS
function get_data($url) {
$ch = curl_init();
$timeout = 15;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// follow Location: newurl.tld - i.e. HTTP 30X status codes
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// give up follow location if circular after 5 tries
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT, random_user_agent());
$data = curl_exec($ch);
// if in doubt, whats going on, look through info of curl_getinfo($ch);
// var_dump(curl_getinfo($ch));
curl_close($ch);
return $data;
}
//Grab HTML
$urllist = fopen("links.txt", "r+");
for ($j = 0; $j <= 50; $j++) {
$post = rtrim(fgets($urllist));
echo $post;
$html = get_data($post);
echo $html;
}
Optionally, since it seems like youre doing this more then once, returning to youre links.txt pages - set a cookie-container, which allows for the visits to know you have been there before - and reusing that information on consecutive runs
// filehandle, writeable by httpd user:
$cookie_file = "/tmp/cookie/cookie1.txt";
// set request to parse cookies and send them with corresponding host requests
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
// set response cookies to be saved
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
Upvotes: 0
Reputation: 57244
Try this code out.
function get_data($url)
{
$ch = curl_init();
$timeout = 15;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT, random_user_agent());
// Edit: Follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
var_dump(curl_getinfo($ch));
curl_close($ch);
return $data;
}
//Grab HTML
$urllist = fopen("links.txt", "r+");
for ($j = 0; $j <= 50; $j++)
{
if($post = rtrim(fgets($urllist)))
{
echo $post;
echo get_data($post);
}
else
{
echo "No URL provided!";
}
echo "\n<hr>\n";
}
Upvotes: 2