ramesh kumar
ramesh kumar

Reputation: 1707

Extract domain name suffix from any url

I have a url in string format like this :

str="http://code.google.com"

and some other like str="http://sub.google.co.in"      

i want to extract google.com from first one, and google.co.in from second string .

what i did is :

var a, d, i, ind, j, till, total;

a = document.createElement('a');

a.href = "http://www.wv.sdf.sdf.sd.ds..google.co.in";

d = "";

if (a.host.substr(0, 4) === "www.") {
  d = a.host.replace("www.", "");
} else {
  d = a.host;
}

till = d.indexOf(".com");

total = 0;

for (i in d) {
  if (i === till) {
    break;
  }
  if (d[i] === ".") {
    total++;
  }
}

j = 1;

while (j < total) {
  ind = d.indexOf(".");
  d = d.substr(ind + 1, d.length);
  j++;
}

alert(d);

My code works but it works only for ".com" , it doesnt work for others like ".co.in","co.uk" till i specify them manually , Can anyone tell me the solution for this ? I dont mind even i need to change the full code, but it should work . Thanks

Upvotes: 2

Views: 1422

Answers (2)

carlodurso
carlodurso

Reputation: 2894

Regular expressions are quite powerful for such problems.

https://regex101.com/r/rW4rD8/1

The code below should fit this purpose.

var getSuffixOnly = function (url) {

    var normalized = url.toLowerCase();
    var noProtocol = normalized.replace(/.*?:\/\//g, "");
    var splittedURL = noProtocol.split(/\/|\?+/g);

    if (splittedURL.length > 1){
        noProtocol = splittedURL[0].toString().replace(/[&\/\\#,+()$~%'":*?<>{}£€^ ]/g, '');
    }

    var regex = /([^.]{2,}|[^.]{2,3}\.[^.]{2})$/g;
    var host = noProtocol.match(regex);


    return host.toString();

};

getSuffixOnly(window.location.host);

Upvotes: 0

Alnitak
Alnitak

Reputation: 340055

The only current practical solution (and even that doesn't work 100%) is to refer to the Public Suffix List in your code, and synchronise with that list as required.

There is no algorithm that can look at a domain name and figure out which part is the "registered domain name" and which parts are subdomains. It can't even be done by interrogating the DNS itself.

Upvotes: 3

Related Questions