Reputation: 12897
I have an url like this:
I want to split that url to get the host part only. For that I am using
parse_url($url,PHP_URL_HOST);
it returns www.w3schools.com. I want to get only 'w3schools.com'. is there any function for that or do i have to do it manually?
Upvotes: 0
Views: 482
Reputation: 300825
There are many ways you could do this. A simple replace is the fastest if you know you always want to strip off 'www.'
$stripped=str_replace('www.', '', $domain);
A regex replace lets you bind that match to the start of the string:
$stripped=preg_replace('/^www\./', '', $domain);
If it's always the first part of the domain, regardless of whether its www, you could use explode/implode. Though it's easy to read, it's the most inefficient method:
$parts=explode('.', $domain);
array_shift($parts); //eat first element
$stripped=implode('.', $parts);
A regex achieves the same goal more efficiently:
$stripped=preg_replace('/^\w+\./', '', $domain);
Now you might imagine that the following would be more efficient than the above regex:
$period=strpos($domain, '.');
if ($period!==false)
{
$stripped=substr($domain,$period+1);
}
else
{
$stripped=$domain; //there was no period
}
But I benchmarked it and found that over a million iterations, the preg_replace
version consistently beat it. Typical results, normalized to the fastest (so it has a unitless time of 1):
/^\w+\./
: 1.494The above code samples always strip the first domain component, so will work just fine on domains like "www.example.com" and "www.example.co.uk" but not "example.com" or "www.department.example.com". If you need to handle domains that may already be the main domain, or have multiple subdomains (such as "foo.bar.baz.example.com") and want to reduce them to just the main domain ("example.com"), try the following. The first sample in each approach returns only the last two domain components, so won't work with "co.uk"-like domains.
explode
:
$parts = explode('.', $domain);
$parts = array_slice($parts, -2);
$stripped = implode('.', $parts);
Since explode
is consistently the slowest approach, there's little point in writing a version that handles "co.uk".
regex:
$stripped=preg_replace('/^.*?([^.]+\.[^.]*)$/', '$1', $domain);
This captures the final two parts from the domain and replaces the full string value with the captured part. With multiple subdomains, all the leading parts get stripped.
To work with ".co.uk"-like domains as well as a variable number of subdomains, try:
$stripped=preg_replace('/^.*?([^.]+\.(?:[^.]*|[^.]{2}\.[^.]{2}))$/', '$1', $domain);
str:
$end = strrpos($domain, '.') - strlen($domain) - 1;
$period = strrpos($domain, '.', $end);
if ($period !== false) {
$stripped = substr($domain,$period+1);
} else {
$stripped = $domain;
}
Allowing for co.uk domains:
$len = strlen($domain);
if ($len < 7) {
$stripped = $domain;
} else {
if ($domain[$len-3] === '.' && $domain[$len-6] === '.') {
$offset = -7;
} else {
$offset = -5;
}
$period = strrpos($domain, '.', $offset);
if ($period !== FALSE) {
$stripped = substr($domain,$period+1);
} else {
$stripped = $domain;
}
}
The regex and str-based implementations can be made ever-so-slightly faster by sacrificing edge cases (where the primary domain component is a single letter, e.g. "a.com"):
regex:
$stripped=preg_replace('/^.*?([^.]{3,}\.(?:[^.]+|[^.]{2}\.[^.]{2}))$/', '$1', $domain);
str:
$period = strrpos($domain, '.', -7);
if ($period !== FALSE) {
$stripped = substr($domain,$period+1);
} else {
$stripped = $domain;
}
Though the behavior is changed, the rankings aren't (most of the time). Here they are, with times normalized to the quickest.
Here, the difference between times is so small that it wasn't unusual for . The fast .co.uk regex, for example, often beat the basic multiple subdomain regex. Thus, the exact implementation shouldn't have a noticeable impact on speed. Instead, pick one based on simplicity and clarity. As long as you don't need to handle .co.uk domains, that would be the multiple subdomain regex approach.
Upvotes: 6
Reputation: 17225
You need to strip off any characters before the first occurencec of [.] character (along with the [.] itself) if and only if there are more than 1 occurence of [.] in the returned string.
for example if the returned string is www-139.in.ibm.com then the regular expression should be such that it returns in.ibm.com since that would be the domain.
If the returned string is music.domain.com then the regular expression should return domain.com
In rare cases you get to access the site without the prefix of the server that is you can access the site using http://domain.com/pageurl, in this case you would get the domain directly as domain.com, in such case the regex should not strip anything
IMO this should be the pseudo logic of the regex, if you want I can form a regex for you that would include these things.
Upvotes: 0
Reputation: 83622
You have to strip off the subdomain part by yourself - there is no built-in function for this.
// $domain beeing www.w3scools.com
$domain = implode('.', array_slice(explode('.', $domain), -2));
The above example also works for subdomains of a unlimited depth as it'll alwas return the last two domain parts (domain and top-level-domain).
If you only want to strip off www. you can simply do a str_replace()
, which will be faster indeed:
$domain = str_replace('www.', '', $domain);
Upvotes: 0