Reputation: 1286
I have a regular expression to check for valid identifiers in a script language. These start with a letter or underscore, and can be followed by 0 or more letters, underscores, digits and $ symbols. However, if I call
Util.IsValidIdentifier( "hello\n" );
it returns true. My regex is
const string IDENTIFIER_REGEX = @"^[A-Za-z_][A-Za-z0-9_\$]*$";
so how does the "\n" get through?
Upvotes: 4
Views: 96
Reputation: 89557
Your result is true
with hello\n
because you don't need to escape the $
inside a character class, thus the backslash is matched because you have a backslash (seen as literal) inside the character class.
Try this:
const string IDENTIFIER_REGEX = @"^[A-Za-z_][A-Za-z0-9_$]*$";
Since you are testing variable names that are in one line, you can use $
as end of the string.
Upvotes: 0
Reputation: 2930
The $
matches the end of lines. You need to use \z
to match the end of the text, along with RegexOptions.Multiline
. You might also want to use \A
instead of ^
to match the beginning of the text, not of the line.
Also, you don't need to escape the $
in the character class.
Upvotes: 5
Reputation: 133567
Because $ is a valid metacharacter which means the end of the string (or the end of the line, just before the newline). From msdn:
$: The match must occur at the end of the string or before \n at the end of the line or string.
You should escape it: \$
(and add \z
if you want to match the end of the string there).
Upvotes: 1