Enhancing regex to match more URLs

Question

Considering this regex:

  static String AdrPattern="(?:http://www\.([^/&]+)\.com/|(?!^)\G)/?([^/]+)";

I have two small questions:

How is it possible to make it to match URLs that only have the domain name, without any further path/segment? (such as https://stackoverflow.com)
How is it possible to make this regex to match URLs with different domain extensions?

P.S: the regex is taken from here and works fine, but these two shortcomings should be fixed.

EDIT

Based on the below code, the answer made to this post will skip the further segments and only prints the domain name:

         static String AdrPattern= "(?:(?!\A)\G(?:/([^\s/]+))|http://www\.([^\s/&]+)\.(?:com|net|gov|org)(?:/([^\s/]+))?)";
         static Pattern WebUrlPattern = Pattern.compile (AdrPattern);
         WebUrlMatcher= WebUrlPattern.matcher(line);



        int cn=0;
        while(WebUrlMatcher.find()) {

    if (cnt == 0) 
        {
           String extractedPath = WebUrlMatcher.group(1);

           if(extractedPath!=null){

            fop.write(prefix.toLowerCase().getBytes());


            fop.write(System.getProperty("line.separator").getBytes());



            }

  if(extractedPath!=null)
  {
                fop.write(extractedPath.toLowerCase().getBytes());

                fop.write(System.getProperty("line.separator").getBytes());
  }        

       String extractedPart = WebUrlMatcher.group(2);
       String extractedPart = WebUrlMatcher.group(2);
   String extracted2=WebUrlMatcher.group(3);
   if(extractedPart!=null)
   {
            fop.write(extractedPart.toLowerCase().getBytes());       
            fop.write(System.getProperty("line.separator").getBytes());

            if(extracted2!=null)
            {
            fop.write(extracted2.toLowerCase().getBytes());
            fop.write(System.getProperty("line.separator").getBytes());
            }

   cnt = cnt + 1;

   }
}
    }

    }

Enhancing regex to match more URLs

Answers (1)

Related Questions