Reputation: 1587
Is there a way to check whether a regular expression represents a valid URL ? Say, the regular expressions are Java Strings, is there a way to check whether these Strings represent a valid URL?
For e.g., say
String s1 = "/amazon\.com\//";
String s2 = "/google(\.[a-z]+)?\.[a-z]+\/search.*q=/i";
String s3 = "/.*/"; //Represents any URL
String s4 = "hello world";
s1, s2, and, s3 are valid regular expressions representing Urls but s4 is invalid.
Thanks, Sony
Upvotes: 0
Views: 384
Reputation: 718798
It is easy to create a Regex that will match specific URLs, but it is next to impossible to write one that will match any valid URL, and also NOT match any invalid URLs. For a start, you have to cope with percent encoding and the rules about when it can/should be used for different characters.
I should also point out that none of your examples is a valid URL according to the URL specifications.
My advice would be to use new URL(String)
or new URI(String)
to check for invalid URLs, and then examine the components to perform fine-grained matching.
Upvotes: 1
Reputation: 35
It sounds like the post is asking how to determine if a given regular expression will match a valid URL. Not whether those particular regex examples match a URL.
This could probably be generalized to determining if the language matched by a given regex can also be matched by a "canonical" regex that matches all URLs. This previous question might be of some use:
Upvotes: 0
Reputation: 52738
Either of these should match any url (assuming thats your question, the wording is a bit cryptic):
String urlRegex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
String regexUrl = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
Upvotes: 0