Hao Ting
Hao Ting

Reputation: 93

Website/URL Validation Regex in JAVA

I need a regex string to match URL starting with "http://", "https://", "www.", "google.com"

the code i tried using is:

//Pattern to check if this is a valid URL address
    Pattern p = Pattern.compile("(http://|https://)(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?");
    Matcher m;
    m=p.matcher(urlAddress);

but this code only can match url such as "http://www.google.com"

I know this ma be a dupicate question but i have tried all of the regex provided and it does not suit my requirement. Willl someone please help me? Thank you.

Upvotes: 7

Views: 65983

Answers (6)

KnechtRootrecht
KnechtRootrecht

Reputation: 493

If you use Java, I recommend use this RegEx (I wrote it by myself):

^(https?:\/\/)?(www\.)?([\w]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String

to explain:

  • ^ = line start
  • (https?://)? = "http://" or "https://" may occur.
  • (www.)? = "www." may orrur.
  • ([\w]+.)+ = a word ([a-zA-Z0-9]) has to occur one or more times. (extend here if you need special characters like ü, ä, ö or others in your URL - remember to use IDN.toASCII(url) if you use special characters. If you need to know which characters are legal in general: https://kb.ucla.edu/articles/what-characters-can-go-into-a-valid-http-url
  • [‌​\w]{2,63} = a word ([a-zA-Z0-9]) with 2 to 63 characters has to occur exactly one time. (a TLD (top level domain (for example .com) can not be shorter than 2 or longer than 63 characters)
  • /? = a "/"-character may occur. (some people or servers put a / at the end... whatever)
  • $ = line end

-

If you extend it by special characters it could look like this:

^(https?:\/\/)?(www\.)?([\w\Q$-_+!*'(),%\E]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w\\Q$-_+!*'(),%\\E]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String

The answer of Avinash Raj is not fully correct.

^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$

The dots are not escaped what means it matches with any character. Also my version is simpler and I never heard of a domain like "test..com" (which actually matches...)

Demo: https://regex101.com/r/vM7wT6/279


Edit: As I saw some people needing a regex which also matches servers directories I wrote this:

^(https?:\/\/)?([\w\Q$-_+!*'(),%\E]+\.)+(\w{2,63})(:\d{1,4})?([\w\Q/$-_+!*'(),%\E]+\.?[\w])*\/?$

while this may not be the best one, since I didn't spend too much time with it, maybe it helps someone. You can see how it works here: https://regex101.com/r/vM7wT6/700 It also matches urls like "hello.to/test/whatever.cgi"

Upvotes: 9

A ch b
A ch b

Reputation: 1

//I use that

static boolean esURL(String cadena){

    boolean bandera = false;

    bandera = cadena.matches("\\b(https://?|ftp://|file://|www.)[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]");

    return bandera;
}

Upvotes: -1

darkwinter
darkwinter

Reputation: 19

pattern="w{3}\.[a-z]+\.?[a-z]{2,3}(|\.[a-z]{2,3})"

this will only accept addresses like e.g www.google.com & www.google.co.in

Upvotes: 1

Raj Hassani
Raj Hassani

Reputation: 1677

You can use the Apache commons library(org.apache.commons.validator.UrlValidator) for validating a url:

String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);

And use :-

 urlValidator.isValid(your url)

Then there is no need of regex.

Link:- https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html

Upvotes: 11

raghavsood33
raghavsood33

Reputation: 768

Java compatible version of @Avinash's answer would be

//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("^(http://|https://)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$");
Matcher m;
m=p.matcher(urlAddress);
boolean matches = m.matches();

Upvotes: 3

Avinash Raj
Avinash Raj

Reputation: 174836

You need to make (http://|https://) part in your regex as optional one.

^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$

DEMO

Upvotes: 19

Related Questions