Miguel
Miguel

Reputation: 1177

Excluding certain patterns in a regex

I'm working on a Regex in C# to exclude certain patterns within a string.

These are the types patterns I want to accept are: "%00" (Hex 00-FF) and any other character without a starting '%'. The patterns I would like to exclude are: "%0" (Values with a starting % and one character after) and/or characters "&<>'/".

So far I have this

Regex correctStringRegex = new Regex(@"(%[0-9a-fA-F]{2})|[^%&<>'/]|(^(%.))", 
                                     RegexOptions.IgnoreCase);

Below are examples of what I'm trying to pass and reject.

Passing String %02This is%0A%0Da string%03
Reject String %0%0Z%A&<%0a%

If a string doesn't pass all the requirements I would like to reject the whole string completely.

Any Help would be greatly appreciated!

Upvotes: 1

Views: 5878

Answers (2)

Nevyn
Nevyn

Reputation: 2683

Hmm, given the comments so far, I think you need a different problem definition. You want to pass or fail a string, using regex, based on whether or not the string contains any invalid patterns. Im assuming a string will fail if there is ANY invalid pattern, rather than the reverse of a string passing if there is any valid pattern.

As such, I would use this regex: %(?![0-9a-f]{2})|[&<>'/]

You would then run this in such a way that a string is invalid if you GET a match, a valid string will not have any matches in this set.

A quick explanation of a rather odd regex. The format (?!) tells the regex "Match the previous symbol if the symbols in this set DONT follow it" ie: Match if suffix not present. So, what im telling it to look for is any instance of % that is not followed by 2 hex characters, or any other invalid character. The assumption is that anything that DOESN'T match this regex is a valid character entry.

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336138

I suggest this:

^(?:%[0-9a-f]{2}|[^%&<>'/])*$

Explanation:

^             # Start of string
(?:           # Match either
 %[0-9a-f]{2} # %xx
|             # or
 [^%&<>'/]    # any character except the forbidden ones
)*            # any number of times
$             # until end of string.

This ensures that % is only matched when followed by two hexadecimals. Since you're already compiling the regex with the IgnoreCase flag set, you don't need a-fA-F, either.

Upvotes: 1

Related Questions