Reputation: 509
a string like: 'www.test.com' is good. a string like: 'www.888.com' is good. a string like: 'stackoverflow.com' is good. a string like: 'GOoGle.Com' is good.
why ? because those are valid urls. it does not necessarely matter if they have been registered or not.
now bad strings are:
'goog*d\x' 'manydots...com'
why because you can't register those urls.
if I have a string in java which is supposed to be a good url what's the best way to validate it ?
thanks a lot
Upvotes: 9
Views: 4601
Reputation: 536379
Those examples are hostnames. They're not valid URLs in themselves.
Hostnames are made of .
-separated ‘labels’. Each label must be up to 63 characters of letters, digits and hyphens, but a hyphen must not be the first or last character. It is optional to follow the whole hostname with another dot.
You can match this with a pattern like (assuming case-insensitive):
([a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])(\.[a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])*\.?
However this matches strings like 1.2.3.4
as well, which although they technically could be host/domain names will actually act as direct IP addresses. You may want to allow that. If you do, you may also want to allow IPv6 addresses, which are colon-separated hex; when embedded in a URL, they also have square brackets around them.
And then of course there's IDNA. Nowadays, 例え.テスト
is a valid IDNA domain name, corresponding to xn--r8jz45g.xn--zckzah
. If you want to allow those you'll need some Unicode support.
Summary: it's quite a bit more difficult than you might think. And that's just hostnames. ‘Validating’ a whole URL is even more work. A simple regex isn't going to hack it. Use a pre-existing library.
Upvotes: 3
Reputation: 22721
use UrlValidator from the Apache Commons library. Binary package: http://www.mirrorservice.org/sites/ftp.apache.org/commons/validator/binaries/commons-validator-1.3.1.zip (zip contains .jar files)
Example of usage (Construct a UrlValidator with valid schemes of "http", and "https"):
String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("ftp://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
prints "url is invalid"
If instead the default constructor is used.
UrlValidator urlValidator = new UrlValidator();
if (urlValidator.isValid("ftp://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
prints out "url is valid"
Upvotes: 10
Reputation: 24499
I also believe you can use the URL in java.net
URL url = new URL("www.google.com");
The api says
public URL(String spec) throws MalformedURLException
Parameters:
spec - the String to parse as a URL.
Throws:
MalformedURLException - If the string specifies an unknown protocol.
So an exception is thrown if the URL is invalid.
Upvotes: -1
Reputation: 15043
You can do this kind of "url validation" through Regular Expressions.
And here is where you can get some good URL regex's (so you don't have to write your own).
Upvotes: -2
Reputation: 133577
I think that new URL(yourString)
will do the trick: it is supposed to raise MalformedURLException
if url is not compliant (actually on java api it says If the string specifies an unknown protocol, but you can try it anyway):
try
{
new URL(string);
} catch (MalformedURLException e) {
//do whatever
}
Upvotes: -1