jay32mcr
jay32mcr

Reputation: 23

Getting base domain name php

function getHost($Address) { 
    $parseUrl = parse_url(trim($Address)); 
    return trim($parseUrl[host]
            ? $parseUrl[host] 
            : array_shift(explode('/', $parseUrl[path], 2))
    ); 
} 

$httpreferer = getHost($_SERVER['HTTP_REFERER']);
$httpreferer = preg_replace('#^(http(s)?://)?w{3}\.#', '$1', $httpreferer);

echo $httpreferer; 

I am using this to strip http:// , www and subdomains to return just the host however it returns the following:

http://site.google.com ==> google.com
http://google.com      ==> com

How do i get it to just remove the subdomain when it exists instead of stripping down to the tld when it doesn't exist?

Upvotes: 0

Views: 135

Answers (2)

ArtisticPhoenix
ArtisticPhoenix

Reputation: 21661

Start with parse_url specifically parse_url($url)['host']

 $arr = parse_url($url);
 echo preg_replace('/^www\./', '', $arr['host'])."\n";

Output

site.google.com
google.com

Sandbox

The Regex for this is just matches www. if it's the start of the string, you could probably do this part a few ways, such as with

No subdomain

If you don't want any subdomain at all:

$arr = parse_url($url)['host'];
echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+\..+)$/', '$1',$arr['host'])."\n";

Sandbox

No subdomain, no Country Code

$arr = parse_url($url)['host'];
echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+)(\.[^.]+).*?$/', '$1$2',$arr['host'])."\n";

Sandbox

How it works,

Same as the previous one but the domain is separated from the host, and instead of just capturing everything, we capture everything but the . and outside the new group we capture everything (confusingly the . is everything here) but with *? which means * 0 or more times, ? non-greedy don't take characters from previous expressions.

Or to put it another way. Capture anything 0 or more times don't steal characters from previous matches. This way if there is nothing such as www.google.com we are only worried about stuff after .com then its 0 matches. But if its www.google.com.uk it matches the .uk.

Single Line Answer.

Some versions of PHP, I forget what ones but the newer ones actually let you do this:

   $host = parse_url($url)['host'];

So taking the last example we can compress that into one line and remove the variable assignment.

  echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+)(\.[^.]+).*?$/', '$1$2',parse_url($url)['host'])."\n";

See it in action

That was just for fun!

Summery

Using parse_url is really the "correct" way to do it. Or the proper way to start as it removes a lot of the other "stuff" and gives you a good starting place. Anyway this was fun for me ... :) ... And I needed a break from coding my Website, because it's tedious for me now (It was 8 years old, so I'm redoing it in WordPress, and I've done about a zillion WordPress site) ...

Cheers, hope it helps!

Upvotes: 2

jay32mcr
jay32mcr

Reputation: 23

Found the Answer

$testAdd = "https://testing.google.co.uk";
$parse = parse_url($testAdd);
$httpreferer = preg_replace("/^([a-zA-Z0-9].*\.)?([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z.]{2,})$/", '$2', $parse['host']);


echo $httpreferer;

This will also deal with domain with country TLD

Thanks for all your help.

Upvotes: 0

Related Questions