why does this code (extraction of host-name from a URL with regular expression) fail

Question

I'm trying to match a host-name from a url with regex and groups. I wrote this test in order to simulate the acceptable inputs.

why does this code fails?

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {

    public static void main(String[] args)
    {
        Pattern HostnamePattern = Pattern.compile("^https?://([^/]+)/?", Pattern.CASE_INSENSITIVE);

        String[] inputs = new String[]{

                "http://stackoverflow.com",
                "http://stackoverflow.com/",
                "http://stackoverflow.com/path",
                "http://stackoverflow.com/path/path2",
                "http://stackoverflow.com/path/path2/",
                "http://stackoverflow.com/path/path2/?qs1=1",

                "https://stackoverflow.com/path",
                "https://stackoverflow.com/path/path2",
                "https://stackoverflow.com/path/path2/",
                "https://stackoverflow.com/path/path2/?qs1=1",
        };

        for(String input : inputs)
        {
            Matcher matcher = HostnamePattern.matcher(input);
            if(!matcher.matches() || !"stackoverflow.com".equals(matcher.group(1)))
            {
                throw new Error(input+" fails!");
            }
        }

    }

}

anubhava · Accepted Answer

It is because your regex ^https?://([^/]+)/? and your call to Matcher#matches method which expects to match the input completely.

You need to use:

matcher.find()

Otherwise your regex will only match first 2 input strings: http://stackoverflow.com and http://stackoverflow.com/

why does this code (extraction of host-name from a URL with regular expression) fail

Answers (2)

Related Questions