madphp
madphp

Reputation: 1764

cURL weird status codes when checking URL

I'm checking for the presence of a xml site map on different URLs. If I supply a URL example.com/sitemap.xml, and it has a 301 to www.example.com/sitemap.xml, I get a 301 obviously. If www.example.com/sitemap.xml doesnt exist, I wont see the 404. So, if I get a 301, I execute another cURL to see if a 404 returns for www.example.com/sitemap.xml. But, for reason, I get random 404 and 303 status codes.

private function check_http_status($domain,$file){

        $url = $domain . "/" . $file;

        $curl = new Curl();

        $curl->url = $url;
        $curl->nobody = true;
        $curl->userAgent = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)';
        $curl->execute();
        $retcode = $curl->httpCode();

        if ($retcode == 301 || $retcode == 302){

            $url = "www." . $domain . "/" . $file;

            $curl = new Curl();
            $curl->url = $url;
            $curl->nobody = true;
            $curl->userAgent = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)';
            $curl->execute();
            $retcode = $curl->httpCode();

        }

        return $retcode;

    }

Upvotes: 0

Views: 1542

Answers (3)

youssefr
youssefr

Reputation: 11

"followLocation" works very well. Here is how I implemented it:

$url = "http://www.YOURSITE.com//"; // Assign you url here.

$ch = curl_init(); // initialize curl.
curl_setopt($ch, CURLOPT_URL, $url); // Pass the URL as the option/target.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // 0 will print html. 1 does not.
curl_setopt($ch, CURLOPT_HEADER, 0); // Please curl, inlude the header in the output.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // ..and yes, follow what the server sends as part of the HTTP header.

$response_data = curl_exec($ch); // execute curl with the target URL.
$http_header = curl_getinfo($ch); // Gets information about the last transfer i.e. our URL
// Print the URLs that are not returning 200 Found.
if($http_header['http_code'] != "200") {
    echo " <b> PAGE NOT FOUND => </b>"; print $http_header['http_code'];
}
// print $http_header['url']; // Print the URL sent back in the header. This will print the page to wich you were redirected.
print $url; // this will print the original URLs that you are trying to access

curl_close($ch); // we are done with curl; so let's close it.

Upvotes: 0

Kami
Kami

Reputation: 19447

Have a look at the list of response codes returned - http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.

Usually a web browser will automatically handle these, but as you are doing things manually with curl, you need to understand what each response means. The 301 or 302 means that you should use the alternative url supplied to access the resource. This may be a simple as addin www to the request but it also may be more complex as a redirect to a different domain altogather.

The 303 means that you are using a POST attempt to access the resource, and should use GET.

Upvotes: 2

user1703809
user1703809

Reputation:

Well, when you receive a 301 or 302 you should use the location found in the response, not just assume another location and try that.

As you can see in this example, the response from the server contains the new location of the file. Use that for your next request: http://en.wikipedia.org/wiki/HTTP_301#Example

Upvotes: 0

Related Questions