Omer Danish
Omer Danish

Reputation: 101

How to Extract only the website name from a URL String not www. and .com with it

I just want to show the website name only.
I don't want to show ".com" or "us.cnn.com" or "www.bbc.co.uk" Just the name of the website Like "cnn" or "bbc" only.
My code:

private String getHostName(String urlInput) {
    urlInput = urlInput.toLowerCase();
    String hostName = urlInput;
    if (!urlInput.equals("")) {
        if (urlInput.startsWith("http") || urlInput.startsWith("https")) {
            try {
                URL netUrl = new URL(urlInput);
                String host = netUrl.getHost();
                if (host.startsWith("www")) {
                    hostName = host.substring("www".length() + 1);
                } else {
                    hostName = host;
                }
            } catch (MalformedURLException e) {
                hostName = urlInput;
            }
        } else if (urlInput.startsWith("www")) {
            hostName = urlInput.substring("www".length() + 1);
        }
        return hostName;
    } else {
        return "";
    }
}  

Inputs

http://www.bbc.co.uk/news/world-us-canada-39018776"
http://us.cnn.com/2017/02/18/politics/john-mccain-donald-trump-dictators/index.html"  
http://bigstory.ap.org/article/d5dd5962fc4d42b195117ca63e0ba9af/revived-rally-trump-turns-back-governing  

Outputs

www.bbc.co.uk  
us.cnn.com  
bigstory.ap.org

I just want to extract the "bbc", "cnn" and "ap" name from it.

Upvotes: 1

Views: 1170

Answers (3)

Manohar
Manohar

Reputation: 23404

String mainUrl;
urlInput = urlInput.toLowerCase();
String hostName = urlInput;
String[] suburls = hostName.split("\\."); 
mainUrl=suburl[0]
if(suburls[0].contains("www")){  
    mainUrl=suburl[1];
  }
if(mainUrl.contains("http://"))
     mainUrl.replace("http://","");
else if(mainUrl.contains("https://")
    mainUrl.replace("https://","");

now the result should be in mainUrl

Upvotes: 0

Maulik Santoki
Maulik Santoki

Reputation: 532

First convert your website URL to URI:

public static String getDomainName(String url) throws URISyntaxException {
    URI uri = new URI(url);
    String domain = uri.getHost();
    return domain.startsWith("www.") ? domain.substring(4) : domain;
}

Click here for full details.

Upvotes: -1

Sid Mhatre
Sid Mhatre

Reputation: 3417

You can use the java.net.URI-class to extract the hostname from the string.

Example code :

public String getHostName(String url) {
    URI uri = new URI(url);
    String hostname = uri.getHost();
    // to provide faultproof result, check if not null then return only hostname, without www.
    if (hostname != null) {
        return hostname.startsWith("www.") ? hostname.substring(4) : hostname;
    }
    return hostname;
}

This above gives you the hostname, and is faultproof if your hostname does start with either google.com/... or www.google.com/..., which will return with 'google'.

If the given url is invalid (undefined hostname), it returns with null.

Upvotes: 3

Related Questions