Reputation: 5653
I have been trying to find an effective url parser, php's own does not include subdomain or extension. On php.net a number of users had contributed and made this:
function parseUrl($url) {
$r = "^(?:(?P<scheme>\w+)://)?";
$r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?";
$r .= "(?P<host>(?:(?P<subdomain>[-\w\.]+)\.)?" . "(?P<domain>[-\w]+\.(?P<extension>\w+)))";
$r .= "(?::(?P<port>\d+))?";
$r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!"; // Delimiters
preg_match ( $r, $url, $out );
return $out;
}
Unfortunately it fails on paths with a '-' and I can't for the life of me workout how to amend it to accept '-' in the path name.
Thanks
Upvotes: 0
Views: 1144
Reputation: 30170
try this...
function parseUrl($url) {
$r = "^(?:(?P<scheme>\w+)://)?";
$r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?";
$r .= "(?P<host>(?:(?P<subdomain>[-\w\.]+)\.)?" . "(?P<domain>[-\w]+\.(?P<extension>\w+)))";
$r .= "(?::(?P<port>\d+))?";
$r .= "(?P<path>[\w/-]*/(?P<file>[\w-]+(?:\.\w+)?)?)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!";
preg_match ( $r, $url, $out );
return $out;
}
i added dashes to the path and file
Upvotes: 1
Reputation: 3599
It's much easier to use a existing parse_url function and then parse the subdomain from the 'host' index.
Example:
$url = 'http://username:[email protected]/path?arg=value#anchor';
$urlInfo = parse_url($url);
$host = $urlInfo['host'];
$subdomain = substr($host, 0, strpos($host, '.'));
$tld = substr($host, strrpos($host, '.') + 1);
Upvotes: 1