Ramesh
Ramesh

Reputation: 2337

How to check if the subdomain is also from same domain using java

i have a list of url's i need to filter specific domain and subdomain. say i have some domains like

http://www.example.com
http://test.example.com
http://test2.example.com

I need to extract urls which from domain example.com.

Upvotes: 1

Views: 4837

Answers (2)

jmreader
jmreader

Reputation: 632

Working on project that required me to determine if two URLs are from the same sub domain (even when there are nested domains). I worked up a modification from the guide above. This holds out pretty well thus far:

public static boolean isOneSubdomainOfTheOther(String a, String b) {

        try {
            URL first = new URL(a);
            String firstHost = first.getHost();
            firstHost = firstHost.startsWith("www.") ? firstHost.substring(4) : firstHost;

            URL second = new URL(b);
            String secondHost = second.getHost();
            secondHost = secondHost.startsWith("www.") ? secondHost.substring(4) : secondHost;

            /*
             Test if one is a substring of the other
             */           
            if (firstHost.contains(secondHost) || secondHost.contains(firstHost)) {

                String[] firstPieces = firstHost.split("\\.");
                String[] secondPieces = secondHost.split("\\.");

                String[] longerHost = {""};
                String[] shorterHost = {""};

                if (firstPieces.length >= secondPieces.length) {
                    longerHost = firstPieces;
                    shorterHost = secondPieces;
                } else {
                    longerHost = secondPieces;
                    shorterHost = firstPieces;
                }
                //int longLength = longURL.length;
                int minLength = shorterHost.length;
                int i = 1;

                /*
                 Compare from the tail of both host and work backwards
                 */
                while (minLength > 0) {
                    String tail1 = longerHost[longerHost.length - i];
                    String tail2 = shorterHost[shorterHost.length - i];

                    if (tail1.equalsIgnoreCase(tail2)) {
                        //move up one place to the left
                        minLength--;
                    } else {
                        //domains do not match
                        return false;
                    }
                    i++;
                }
                if (minLength == 0) //shorter host exhausted. Is a sub domain
                    return true;
            }
        } catch (MalformedURLException ex) {
            ex.printStackTrace();
        }
        return false;
    }

Figure I'd leave it here for future reference of a similar problem.

Upvotes: 3

Saurabh Agarwal
Saurabh Agarwal

Reputation: 527

I understand you are probably looking for a fancy solution using URL class or something but it is not required. Simply think of a way to extract "example.com" from each of the urls.

Note: example.com is essentially a different domain than say example.net. Thus extracting just "example" is technically the wrong thing to do.

We can divide a sample url say:

http://sub.example.com/page1.html

Step 1: Split the url with delimiter " / " to extract the part containing the domain.

Each such part may be looked at in form of the following blocks (which may be empty)

[www][subdomain][basedomain]

Step 2: Discard "www" (if present). We are left with [subdomain][basedomain]

Step 3: Split the string with delimiter " . "

Step 4: Find the total number of strings generated from the split. If there are 2 strings, both of them are the target domain (example and com). If there are >=3 strings, get the last 3 strings. If the length of last string is 3, then the last 2 strings comprise the domain (example and com). If the length of last string is 2, then the last 3 strings comprise the domain (example and co and uk)

I think this should do the trick (I do hope this wasn't a homework :D )

    //You may clean this method to make it more optimum / better
    private String getRootDomain(String url){
         String[] domainKeys = url.split("/")[2].split("\\.");
             int length = domainKeys.length;
             int dummy = domainKeys[0].equals("www")?1:0;
             if(length-dummy == 2) 
                  return domainKeys[length-2] + "." + domainKeys[length-1];
             else{
                  if(domainKeys[length-1].length == 2) {
                       return domainKeys[length-3] + "." + domainKeys[length-2] + "." + domainKeys[length-1];
                  }
                  else{
                       return domainKeys[length-2] + "." + domainKeys[length-1];
                  }       
             }

    }

Upvotes: 2

Related Questions