Reputation: 1707
I have a url in string format like this :
str="http://code.google.com"
and some other like str="http://sub.google.co.in"
i want to extract google.com from first one, and google.co.in from second string .
what i did is :
var a, d, i, ind, j, till, total;
a = document.createElement('a');
a.href = "http://www.wv.sdf.sdf.sd.ds..google.co.in";
d = "";
if (a.host.substr(0, 4) === "www.") {
d = a.host.replace("www.", "");
} else {
d = a.host;
}
till = d.indexOf(".com");
total = 0;
for (i in d) {
if (i === till) {
break;
}
if (d[i] === ".") {
total++;
}
}
j = 1;
while (j < total) {
ind = d.indexOf(".");
d = d.substr(ind + 1, d.length);
j++;
}
alert(d);
My code works but it works only for ".com" , it doesnt work for others like ".co.in","co.uk" till i specify them manually , Can anyone tell me the solution for this ? I dont mind even i need to change the full code, but it should work . Thanks
Upvotes: 2
Views: 1422
Reputation: 2894
Regular expressions are quite powerful for such problems.
https://regex101.com/r/rW4rD8/1
The code below should fit this purpose.
var getSuffixOnly = function (url) {
var normalized = url.toLowerCase();
var noProtocol = normalized.replace(/.*?:\/\//g, "");
var splittedURL = noProtocol.split(/\/|\?+/g);
if (splittedURL.length > 1){
noProtocol = splittedURL[0].toString().replace(/[&\/\\#,+()$~%'":*?<>{}£€^ ]/g, '');
}
var regex = /([^.]{2,}|[^.]{2,3}\.[^.]{2})$/g;
var host = noProtocol.match(regex);
return host.toString();
};
getSuffixOnly(window.location.host);
Upvotes: 0
Reputation: 340055
The only current practical solution (and even that doesn't work 100%) is to refer to the Public Suffix List in your code, and synchronise with that list as required.
There is no algorithm that can look at a domain name and figure out which part is the "registered domain name" and which parts are subdomains. It can't even be done by interrogating the DNS itself.
Upvotes: 3