Reputation: 4541
I am needing to come up with a regex to extract only domainname.extension from a url. Right now I have a regex that strips out "www." from the host name, but I need to update the regex to remove any subdomain strings from the hostname:
This strips off www.:
window.location.hostname.replace(/^www\./i, '')
But I need to detect any subdomain info on abc.def.test.com or ghi.test.com to replace it with an empty string and always return "test.com"
Upvotes: 1
Views: 1501
Reputation: 12688
Well, that depends mainly on what you define as a domain and how do you define a subdomain. I'll use the most generalised approach of considering the top domain as the last two subcomponents (like you use in test.com
) In that case you can proceed as:
([a-zA-Z0-9-]+\.)*([a-zA-Z0-9-]+\.[a-zA-Z0-9-]+) ==> $2
as you see, the regexp is divided in two groups, and we only get the second in the output, which is the last two domain components. The [a-zA-Z0-9-]
subexpression demands some explanation, as it appears thrice in the regexp: It is the set of chars allowed in a domain component, including the -
hyphen. See [1] for a working demo.
in the case you want to cope with the co.uk
example posted in the last demo, to match www.test.co.uk
as test.co.uk
, then you have to anchor your regexp to the end (with $
, or if you are in the middle of a url, with the next :
or /
that can follow the domain name), to avoid that prefixes get detected as valid domains like it is shown in [2]:
(([a-zA-Z-9-]+\.)*?)([a-zA-Z0-9-]+\.[a-zA-Z0-9-]+(\.(uk|au|tw|cn))?)$ ==> $3
or [3]
(([a-zA-Z-9-]+\.)*?)([a-zA-Z0-9-]+\.[a-zA-Z0-9-]+(\.(uk|au|tw|cn))?)(?=[:/]|$) ==> $3
Of course, you have to put in the list all countries that follow the convention of using top domains as prefixes under their structure. You have to be careful here, as not all countries follow this approach. I've used the non-greedy *?
operator here, as if I don't, then the group matching doesn't get as desired (the first group gets greedy, and the match is again at co.uk
instead of test.co.uk
)
But as you have finally to anchor your regexp (mainly because you can have domain names in the query string part of the url or in the subpath part, the best it to anchor it to the whole url.
Upvotes: 0
Reputation: 48751
You could achieve the same result with replace method but match is some how more suitable:
console.log(
window.location.hostname.match(/[^\s.]+\.[^\s.]+$/)[0]
);
[^\s.]+
Match non-whitespace characters except dot$
Assert end of input stringDoing so with replace method according to comments:
console.log(
window.location.hostname.replace(/[^\s.]+\.(?=[^\s.]\.)/g, '')
);
Upvotes: 2