Em Bq
Em Bq

Reputation: 55

Java regexp that only matches URLs without protocol and www

I need a rather greedy regex that agressively matches strings that does not begin with any protocol such as "http://" or "ftp://" and at the same time doesn't match strings that begin with a "www" (or both combined, of course). I'm fairly new to Java and regex but I've managed to make up this one (that doesn't work for me):

([\w'-]+)\.(com|info|net|org).+

However it doesn't seem to match "example.com". It does seem match "example.com/index.php?q=somequery#something". I don't really understand how to create a regex that doesn't give a match if the string begins with a series of characters, in my case "www" or "http://".

Any help is appreciated.

(P.S I've tried to look for dupes to this question, I however couldn't find one that matches this one perfectly. Very sorry if this is a dupe.)

Upvotes: 0

Views: 228

Answers (1)

Sabuj Hassan
Sabuj Hassan

Reputation: 39395

Your regex has .+ at the end. Which means any character except \n (1 or more times).

But your sample example.com doesn't have anything after the .com. That's why your regex doesn't match with the sample.

replace the .+ with .* and it will work for you. FYI the .* means any character except \n (0 or more times)

Upvotes: 1

Related Questions