Reputation: 10887
I was trying to read a page from the same site using PHP. I came across this good discussion and decided to use the cURL method suggested:
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
//Now get the webpage
$data = get_web_page( "https://www.google.com/" );
//Display the data (optional)
echo "<pre>" . $data['content'] . "</pre>";
So, for my case, I called the get_web_page
like this:
$target_url = "http://" . $_SERVER['SERVER_NAME'] . "/press-release/index.html";
$page = get_web_page($target_url);
The thing that I couldn't fathom is it worked on all of my test servers but one. I've verified that the cURL is available on the server in question. Also, setting `$target_url = "http://www.google.com" worked fine. So, I am pretty positive that the culprit has nothing to do with the cURL library.
Can it be because some servers block themselves from being "crawled" by this type of script? Or, maybe I just missed something here?
Thanks beforehand.
Similar questions:
Upvotes: 1
Views: 256
Reputation: 10887
It turned out that there's nothing wrong with the above script. And yes, $target_url = "http://" . $_SERVER['SERVER_NAME'] . "/press-release/index.html";
returned the intended value (as questioned by @ajreal in his answer).
The problem was actually due to how the IP (of the target page) was being resolved, which makes the answer to this question not related to PHP nor Apache: when I ran the script on the server under test, the returned IP address wasn't accessible. Please refer to this more detailed explanation / discussion.
One take away: please first try curl -v
from the command line, which might give you useful clues.
Upvotes: 0
Reputation: 2825
Try using HTTP_HOST instead of SERVER_NAME. They're not quite the same.
Upvotes: 0
Reputation: 47321
$target_url = "http://" . $_SERVER['SERVER_NAME'] . "/press-release/index.html";
I not sure the above expression is actually return the correct URL for you,
this might the cause of all problem.
Can it be because some servers block themselves from being "crawled" by this type of script?
Yes, it could be.
But I don't have the answer, because you did not put in the implementation details.
This is your site, you should able to check.
In a general, I would say this is a bad idea,
if you are trying to access another page from the same domain,
you can just simply do file_get_contents(PATH_TO_FILE.'/press-release/index.html');
(judge by the extension HTML, I assume that is static page)
If that page is require some PHP processing,
well, you just need to prepare all the necessary variables ... then require the file.
Upvotes: 2