Reputation: 23
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
return trim($parseUrl[host]
? $parseUrl[host]
: array_shift(explode('/', $parseUrl[path], 2))
);
}
$httpreferer = getHost($_SERVER['HTTP_REFERER']);
$httpreferer = preg_replace('#^(http(s)?://)?w{3}\.#', '$1', $httpreferer);
echo $httpreferer;
I am using this to strip http:// , www and subdomains to return just the host however it returns the following:
http://site.google.com ==> google.com
http://google.com ==> com
How do i get it to just remove the subdomain when it exists instead of stripping down to the tld when it doesn't exist?
Upvotes: 0
Views: 135
Reputation: 21661
Start with parse_url
specifically parse_url($url)['host']
$arr = parse_url($url);
echo preg_replace('/^www\./', '', $arr['host'])."\n";
Output
site.google.com
google.com
The Regex for this is just matches www.
if it's the start of the string, you could probably do this part a few ways, such as with
No subdomain
If you don't want any subdomain at all:
$arr = parse_url($url)['host'];
echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+\..+)$/', '$1',$arr['host'])."\n";
No subdomain, no Country Code
$arr = parse_url($url)['host'];
echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+)(\.[^.]+).*?$/', '$1$2',$arr['host'])."\n";
How it works,
Same as the previous one but the domain is separated from the host, and instead of just capturing everything, we capture everything but the .
and outside the new group we capture everything (confusingly the . is everything here) but with *?
which means *
0 or more times, ?
non-greedy don't take characters from previous expressions.
Or to put it another way. Capture anything 0 or more times don't steal characters from previous matches. This way if there is nothing such as www.google.com
we are only worried about stuff after .com
then its 0 matches. But if its www.google.com.uk
it matches the .uk
.
Single Line Answer.
Some versions of PHP, I forget what ones but the newer ones actually let you do this:
$host = parse_url($url)['host'];
So taking the last example we can compress that into one line and remove the variable assignment.
echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+)(\.[^.]+).*?$/', '$1$2',parse_url($url)['host'])."\n";
That was just for fun!
Summery
Using parse_url
is really the "correct" way to do it. Or the proper way to start as it removes a lot of the other "stuff" and gives you a good starting place. Anyway this was fun for me ... :) ... And I needed a break from coding my Website, because it's tedious for me now (It was 8 years old, so I'm redoing it in WordPress, and I've done about a zillion WordPress site) ...
Cheers, hope it helps!
Upvotes: 2
Reputation: 23
Found the Answer
$testAdd = "https://testing.google.co.uk";
$parse = parse_url($testAdd);
$httpreferer = preg_replace("/^([a-zA-Z0-9].*\.)?([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z.]{2,})$/", '$2', $parse['host']);
echo $httpreferer;
This will also deal with domain with country TLD
Thanks for all your help.
Upvotes: 0