Kartikeya Khosla
Kartikeya Khosla

Reputation: 18873

Disallow specific special characters in regex

I have the following regex:

[\u00BF-\u1FFF\u2C00-\uD7FF\w \&quot;"",.()/-<br\s/?>]+$

It allows characters of any language except special characters like #,* etc.(although some special characters are allowed as you can see in the regex above).

However, my regex also allows unwanted special characters like <,>,&.

How should I modify this regex to disallow these characters in the input string?

Upvotes: 1

Views: 1301

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You need to use alternation for some of the regex parts (<br\s/?> is treated as separate characters <, b, etc.), and /-< is creating a range accepting many more characters than you think:

enter image description here

Thus, I suggest using

^(?:[\u00BF-\u1FFF\u2C00-\uD7FF\w ",.()/:;-]|&quot;|<br\s?/?>)+$

In C#, using a verbatim string literal:

@"^(?:[\u00BF-\u1FFF\u2C00-\uD7FF\w "",.()/:;-]|&quot;|<br\s?/?>)+$"

See demo on regexstorm

I am assuming you need to match either of the 3 "entities" or their combinations:

  • [\u00BF-\u1FFF\u2C00-\uD7FF\w ",.()/-] - Ranges of characters \u00BF-\u1FFF and \u2C00-\uD7FF, \w, a space, a double quote, ,, ., (, ), / and a literal hyphen
  • &quot; - A literal &quot;
  • <br\s?/?> - <br> tags (this can match <br>, <br/> and <br />).

^ and $ will force matching at the beginning and end.

Upvotes: 4

Related Questions