symcbean
symcbean

Reputation: 48357

PHP normalization of URL to identical form

I want to check URLs against a list to make choices about processing (this will be looking at datastreams, not as a router in an application) but HTTP makes it very easy to represent the same URL in lots of different ways, e.g. (adapted from rfc 2616):

http://example.com/~smith/home.html
http://example.com:80/~smith/home.html
http://EXAMPLE.com/%7Esmith/home.html
http://EXAMPLE.COM/%7esmith/home.html

all represent the same target resource.

I want the facility to translate a URL to a canonical form...

Is there an easy way to do this consistently?

(It appears that parse_url() does none of these.)

Upvotes: 5

Views: 2848

Answers (1)

Maksym Fedorov
Maksym Fedorov

Reputation: 6456

You can use glenscott/url-normalizer package for URL normalization in compliance of the specification RFC 3986. You can see the result of normalization with help the following simple example:

$urls = [
    'http://example.com/~smith/home.html',
    'http://example.com:80/~smith/home.html',
    'http://EXAMPLE.com/%7Esmith/home.html',
    'http://EXAMPLE.COM/%7esmith/home.html',
    'https://example.com:443/~smith/home.html'
];

foreach ($urls as $url) {
    $normalizer = new URL\Normalizer($url);
    echo $normalizer->normalize(), "</br>"; 
}

The result:

http://example.com/~smith/home.html

http://example.com/~smith/home.html

http://example.com/~smith/home.html

http://example.com/~smith/home.html

https://example.com/~smith/home.html

Upvotes: 6

Related Questions