Reputation: 623
The below code prints www.sub.google.com
.
import java.net.MalformedURLException;
import java.net.URL;
public class GetDomainNameFromURL {
public static void main(String[] args) throws MalformedURLException {
String s = "http://www.sub.google.com/main?&t=20&f=52";
URL u = new URL(s);
String hostName = u.getHost();
System.out.println(hostName);
}
}
How to print google.com
? Need to use plain Java
, no Guava
libraries.
Upvotes: 1
Views: 1484
Reputation: 51711
This is tricky because the URL
class can only get you so far. It gives you the hostname and then it's up to you to extract the domain name minus the subdomain.
To identify the domain name here you need to know what TLDs (top level domains like .com, .co etc.) or ccTLDs (country code TLDs like .co.uk, .uk etc.) you're expecting beforehand because that would determine from where (which .
dot) your domain name starts.
For example, the following regex:
(?<=.)[^.]+\.(com|co(\.uk)?|uk)$
would identify the following domain names for you:
www.google.com
mail.google.co
www.google.co.uk
www.sub.google.uk
A more generic solution would require to make assumptions beforehand like a TLD or ccTLD will have no more than two or three characters to differentiate them from the main domain but with newer TLDs like .guru, .photos, .expert, .legal etc. it's not possible to make it work for all the domains anymore.
Upvotes: 2