Canvas
Canvas

Reputation: 5897

Regex for valid URL characters

I am trying to check a string before saving it to my database,

here is an example string "Paint & Brush"

now the & is invalid, how can I use a Regex to detect this, other characters I want to check for these charaters £, $, %, # etc

I have tried this

Regex RgxUrl = new Regex(@"[^A-Z0-9.\-\)\(]");

however the "paint & brush" example from before was still valid

Upvotes: 7

Views: 18312

Answers (3)

Dirk
Dirk

Reputation: 10968

I would suggest not using a regular expression for that unless you absolutely have to. Using a regular expressions means you have to test and maintain it. .NET already has a Uri class which you can use to verify that a string is a valid URI.

string urlString;
Uri uri;
if (!Uri.TryCreate(urlString, UriKind.RelativeOrAbsolute, out uri)) {
    // it's not a valid URI.
}

You might want to examine the resulting Uri object further to see whether it is a HTTP one, but that should be easy.

Of course this only gives you whether the entire string was valid or not, and not which character wasn't. If you want give a more detailed error message then this method won't work.

Upvotes: 0

huMpty duMpty
huMpty duMpty

Reputation: 14470

Why not

Uri.IsWellFormedUriString(stringURL, UriKind.RelativeOrAbsolute)

Read more Uri.IsWellFormedUriString Method

Or Uri.TryCreate Method

Upvotes: 4

Superbest
Superbest

Reputation: 26612

Validating URLs is a common problem, so you should first consider using the available tools to do it instead of reinventing the wheel. Nevertheless, from wikipedia:

Unreserved

May be encoded but it is not necessary

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~

Reserved

Have to be encoded sometimes

! * ' ( ) ; : @ & = + $ , / ? % # [ ]

Further details can for example be found in RFC 3986 and http://www.w3.org/Addressing/URL/uri-spec.html.

Based on this, your pattern would be [^-\]_.~!*'();:@&=+$,/?%#[A-z0-9]. You want to see if this (exclusive) patten matches any characters in your string, if so, those are probably special characters that must be encoded.

Code generated by RegexBuddy:

bool hasInvalidChars = false;
try {
    hasInvalidChars = Regex.IsMatch(urlToTest, @"[^-\]_.~!*'();:@&=+$,/?%#[A-z0-9]", RegexOptions.Singleline | RegexOptions.Multiline);
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Upvotes: 8

Related Questions