Elad Yosifon
Elad Yosifon

Reputation: 56

why does this code (extraction of host-name from a URL with regular expression) fail

I'm trying to match a host-name from a url with regex and groups. I wrote this test in order to simulate the acceptable inputs.

why does this code fails?

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {

    public static void main(String[] args)
    {
        Pattern HostnamePattern = Pattern.compile("^https?://([^/]+)/?", Pattern.CASE_INSENSITIVE);

        String[] inputs = new String[]{

                "http://stackoverflow.com",
                "http://stackoverflow.com/",
                "http://stackoverflow.com/path",
                "http://stackoverflow.com/path/path2",
                "http://stackoverflow.com/path/path2/",
                "http://stackoverflow.com/path/path2/?qs1=1",

                "https://stackoverflow.com/path",
                "https://stackoverflow.com/path/path2",
                "https://stackoverflow.com/path/path2/",
                "https://stackoverflow.com/path/path2/?qs1=1",
        };

        for(String input : inputs)
        {
            Matcher matcher = HostnamePattern.matcher(input);
            if(!matcher.matches() || !"stackoverflow.com".equals(matcher.group(1)))
            {
                throw new Error(input+" fails!");
            }
        }

    }

}

Upvotes: 1

Views: 123

Answers (2)

nils
nils

Reputation: 1382

Take a look at "http://stackoverflow.com/path". How should your pattern match? It doesn't recognize the part path.

Upvotes: 1

anubhava
anubhava

Reputation: 785481

It is because your regex ^https?://([^/]+)/? and your call to Matcher#matches method which expects to match the input completely.

You need to use:

matcher.find()

Otherwise your regex will only match first 2 input strings: http://stackoverflow.com and http://stackoverflow.com/

Upvotes: 3

Related Questions