Reputation: 557
I am trying to parse URL's in PHP where the input could be any of the following:
Code:
$info = parse_url('http://www.domainname.com/');
print_r($info);
$info = parse_url('www.domain.com');
print_r($info);
$info = parse_url('/test/');
print_r($info);
$info = parse_url('test.php');
print_r($info);
Returns:
Array
(
[scheme] => http
[host] => www.domainname.com
[path] => /
)
Array
(
[path] => www.domain.com
)
Array
(
[path] => /test/
)
Array
(
[path] => test.php
)
The problem you can see is the second example where the domain is returned as a path.
Upvotes: 6
Views: 4823
Reputation: 4014
To handle a URL in a way that preserves that it is was a schema-less URL, whilst also allowing a domain to be identified, use the following code.
if (!preg_match('/^([a-z][a-z0-9\-\.\+]*:)|(\/)/', $url)) {
$url = '//' . $url;
}
So this will apply "//" to beginning of the URL only if the URL does not have a valid scheme and does not begin with "/".
Some quick background on this:
The parser assumes (valid) characters before ":" is the schema, whilst characters following "//" is the domain. To indicate the URL has both a scheme and domain, the two markers must be used consecutively, "://". For example
[scheme]:[path//path]
//[domain][/path]
[scheme]://[domain][/path]
[/path]
[path]
This is how PHP parses URLs with parse_url()
but I couldn't say if it's to standard.
The rules for a valid scheme name is: alpha *( alpha | digit | "+" | "-" | "." )
Upvotes: 0
Reputation: 15616
This gives the right results but the file needs to start with a slash:
parse('http://www.domainname.com/');
parse('www.domain.com');
parse('/test/');
parse("/file.php");
function parse($url){
if(strpos($url,"://")===false && substr($url,0,1)!="/") $url = "http://".$url;
$info = parse_url($url);
if($info)
print_r($info);
}
and the result is :
Array
(
[scheme] => http
[host] => www.domainname.com
[path] => /
)
Array
(
[scheme] => http
[host] => www.domain.com
)
Array
(
[path] => /test/
)
Array
(
[path] => /file.php
)
Upvotes: 12