user2301515
user2301515

Reputation: 5117

PHP, getting a proper url and faster algorithm

I have a function for getting an proper url: example.com to http://example.com, www.example.org to https://example.org etc.

function startsWith($haystack, $needle) {
    return !strncmp($haystack, $needle, strlen($needle));
}

function properUrl($url) {
    $urls = array();
    if (startsWith($url, "https://") || startsWith($url, "http://")) {
        $urls[] = $url;
    } else if (startsWith($url, "www.")) {
        $url = substr($url, 4);
        $urls[] = "http://$url";
        $urls[] = "http://www.$url";
        $urls[] = "https://$url";
        $urls[] = "https://www.$url";
    } else {
        $urls[] = "http://$url";
        $urls[] = "http://www.$url";
        $urls[] = "https://$url";
        $urls[] = "https://www.$url";
    }
    foreach ($urls as $u) {         
        if (@file_get_contents($u)) {
            $url = $u;
            break;
        }
    }
    return $url;
}

What is a quicker algorithm instead of file_get_contents. I've want to get a proper url, no reading an whole page. thanks.

Upvotes: 0

Views: 105

Answers (1)

Damien Overeem
Damien Overeem

Reputation: 4529

Use php's parse_url() http://php.net/manual/en/function.parse-url.php

Example:

<?php
$url = '//www.example.com/path?googleguy=googley';

// Prior to 5.4.7 this would show the path as "//www.example.com/path"
var_dump(parse_url($url));
?>

will give you:

array(3) {
  ["host"]=>
  string(15) "www.example.com"
  ["path"]=>
  string(5) "/path"
  ["query"]=>
  string(17) "googleguy=googley"
}

while:

<?php
$url = 'http://username:password@hostname/path?arg=value#anchor';

print_r(parse_url($url));

echo parse_url($url, PHP_URL_PATH);
?>

will give you:

Array
(
    [scheme] => http
    [host] => hostname
    [user] => username
    [pass] => password
    [path] => /path
    [query] => arg=value
    [fragment] => anchor
)

As you can see it is quite easy to just check the array's indexes for the values you require and build the rest of your url from there. Saves alot of string compare stuff..

To check if the url exists, you should just check for the headers instead of getting the entire file (which is slow). Php's get_headers() will do that for you:

$file = 'http://www.domain.com/somefile.jpg';
$file_headers = @get_headers($file);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
    $exists = false;
} else {
    $exists = true;
}

Good luck!

Upvotes: 1

Related Questions