sony
sony

Reputation: 1587

Is there a way to check if a regular expression represents a url

Is there a way to check whether a regular expression represents a valid URL ? Say, the regular expressions are Java Strings, is there a way to check whether these Strings represent a valid URL?

For e.g., say

String s1 = "/amazon\.com\//";
String s2 = "/google(\.[a-z]+)?\.[a-z]+\/search.*q=/i";
String s3 = "/.*/"; //Represents any URL
String s4 = "hello world";

s1, s2, and, s3 are valid regular expressions representing Urls but s4 is invalid.

Thanks, Sony

Upvotes: 0

Views: 384

Answers (3)

Stephen C
Stephen C

Reputation: 718798

It is easy to create a Regex that will match specific URLs, but it is next to impossible to write one that will match any valid URL, and also NOT match any invalid URLs. For a start, you have to cope with percent encoding and the rules about when it can/should be used for different characters.

I should also point out that none of your examples is a valid URL according to the URL specifications.


My advice would be to use new URL(String) or new URI(String) to check for invalid URLs, and then examine the components to perform fine-grained matching.

Upvotes: 1

kramimus
kramimus

Reputation: 35

It sounds like the post is asking how to determine if a given regular expression will match a valid URL. Not whether those particular regex examples match a URL.

This could probably be generalized to determining if the language matched by a given regex can also be matched by a "canonical" regex that matches all URLs. This previous question might be of some use:

Does an algorithm exist which can determine whether one regular language matches any input another regular language matches?

Upvotes: 0

chown
chown

Reputation: 52738

Either of these should match any url (assuming thats your question, the wording is a bit cryptic):

String urlRegex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
String regexUrl = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

Upvotes: 0

Related Questions