Reputation: 18873
I have the following regex:
[\u00BF-\u1FFF\u2C00-\uD7FF\w \""",.()/-<br\s/?>]+$
It allows characters of any language except special characters like #,*
etc.(although some special characters are allowed as you can see in the regex above).
However, my regex also allows unwanted special characters like <,>,&
.
How should I modify this regex to disallow these characters in the input string?
Upvotes: 1
Views: 1301
Reputation: 626747
You need to use alternation for some of the regex parts (<br\s/?>
is treated as separate characters <
, b
, etc.), and /-<
is creating a range accepting many more characters than you think:
Thus, I suggest using
^(?:[\u00BF-\u1FFF\u2C00-\uD7FF\w ",.()/:;-]|"|<br\s?/?>)+$
In C#, using a verbatim string literal:
@"^(?:[\u00BF-\u1FFF\u2C00-\uD7FF\w "",.()/:;-]|"|<br\s?/?>)+$"
I am assuming you need to match either of the 3 "entities" or their combinations:
[\u00BF-\u1FFF\u2C00-\uD7FF\w ",.()/-]
- Ranges of characters \u00BF-\u1FFF
and \u2C00-\uD7FF
, \w
, a space, a double quote, ,
, .
, (
, )
, /
and a literal hyphen"
- A literal "
<br\s?/?>
- <br>
tags (this can match <br>
, <br/>
and <br />
).^
and $
will force matching at the beginning and end.
Upvotes: 4