Reputation: 48357
I want to check URLs against a list to make choices about processing (this will be looking at datastreams, not as a router in an application) but HTTP makes it very easy to represent the same URL in lots of different ways, e.g. (adapted from rfc 2616):
http://example.com/~smith/home.html
http://example.com:80/~smith/home.html
http://EXAMPLE.com/%7Esmith/home.html
http://EXAMPLE.COM/%7esmith/home.html
all represent the same target resource.
I want the facility to translate a URL to a canonical form...
Is there an easy way to do this consistently?
(It appears that parse_url()
does none of these.)
Upvotes: 5
Views: 2848
Reputation: 6456
You can use glenscott/url-normalizer package for URL normalization in compliance of the specification RFC 3986. You can see the result of normalization with help the following simple example:
$urls = [
'http://example.com/~smith/home.html',
'http://example.com:80/~smith/home.html',
'http://EXAMPLE.com/%7Esmith/home.html',
'http://EXAMPLE.COM/%7esmith/home.html',
'https://example.com:443/~smith/home.html'
];
foreach ($urls as $url) {
$normalizer = new URL\Normalizer($url);
echo $normalizer->normalize(), "</br>";
}
The result:
http://example.com/~smith/home.html
http://example.com/~smith/home.html
http://example.com/~smith/home.html
Upvotes: 6