João Alves
João Alves

Reputation: 1941

Extract Top Level Domain from Domain name

I have an array of top level domains like:

['ag', 'asia', 'asia_sunrise', 'com', 'com.ag', 'org.hn']

Given a domain name, how can i extract the top level domain of the domain name based on the array above? Basically i dont care of how many levels the domain has, i only need to extract the top level domain.

For example:

test1.ag -> should return ag

test2.com.ag -> should return com.ag

test.test2.com.ag -> should return com.ag

test3.org -> should return false

Thanks

Upvotes: 0

Views: 1877

Answers (5)

Traxo
Traxo

Reputation: 19002

$domains = ['ag', 'asia', 'asia_sunrise', 'com', 'com.ag', 'org.hn'];

$str = 'test.test2.com.ag'; //your string

preg_match('/\b('.str_replace('.', '\.', implode('|', $domains)).')$/', $str, $matches);
//replace . with \. because . is reserved in regex for any character 

$result = $matches[0] ?: false;

Edit: added word boundary in regexp and $result is your string or false

Upvotes: 2

Alex
Alex

Reputation: 6037

using regexp is not realy needed, so should be avoided here.

  function topDomain($url) {
      $arr = ['ag', 'asia', 'asia_sunrise', 'com', 'hn'];

      $tld = parse_url($url);
      $toplevel = explode(".", $tld['path'] );
      if(in_array(end($toplevel),$arr)){
         return $url;
      }

ps. 'com.ag' and 'org.hn' are not toplevel domains, but second level domains, so these were left out in the example.

Upvotes: 0

CD001
CD001

Reputation: 8472

Updated to incorporate Traxo's point about the . wildcard; I think my answer is a little fuller so I'll leave it up but we've both essentially come to the same solution.

//set up test variables
$aTLDList = ['ag', 'asia', 'asia_sunrise', 'com', 'com.ag', 'org.hn'];
$sDomain = "badgers.co.uk"; // for example

//build the match
$reMatch = '/^.*?\.(' . str_replace('.', '\.', implode('|', $aTLDList)) . ')$/';
$sMatchedTLD = preg_match($reMatch, $sDomain) ? 
        preg_replace($reMatch, "$1", $sDomain) : 
        "";

Resorting to Regular Expressions may be overkill but it makes for a concise example. This will give you either the TLD matched or an empty string in the $sMatchedTLD variable.

The trick is to make the first .* match ungreedy (.*?) otherwise badgers.com.ag will match ag rather than com.ag.

Upvotes: 1

n-dru
n-dru

Reputation: 9430

Firstly, you should provide an array sorted by length of similar domains, for example 'com.ag' before 'ag'. And then:

function get_domain($s){
    $a = ['com.ag', 'ag', 'asia_sunrise', 'asia', 'com', 'org.hn'];
    foreach($a as $v){
        if(preg_match("/$v$/",$s)){// if it ends with the array's value
            return $v;
        }
    }
    return false;// if none matched the pattern, loop ends and returns false
}
echo get_domain('test.test2.com.ag');// com.ag

Upvotes: 0

Starx
Starx

Reputation: 78991

parseurl() function gives you access to the host name of the url. You can use that to process the host name and find out the tld.

$url = 'http://your.url.com.np';
var_dump(parse_url($url, PHP_URL_HOST));

Next steps could be using explode() to split the host name and checking the last item in the exploded list. But I am going to leave that to you.

Upvotes: 0

Related Questions